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POLYKETIDE- ASSOCIATED SUGAR BIOSYNTHESIS GENES 

This application claims the benefit of U.S. Serial No. 08/576,626 filed December 21 , 
1995, now pending. 

Field of the Invention 
The present invention relates to methods for directing the biosynthesis of specific 
polyketide analogs by genetic manipulation. In particular, sugar biosynthesis genes are 
manipulated to produce precise, novel glycosylation-modified macrolides of predicted 
structure. 

Background of the Invention 

Polyketides are a large class of natural products that includes many important 
antibiotic, antifungal, anticancer, and anti-helminthic compounds such as erythromycins, 
amphotericins, daunorubicins, and avermectins. Their synthesis proceeds by an ordered 
condensation of acyl esters to generate carbon chains of varying length, side chain, and 
reduction pattern that are differentially cyclized and subsequently modified to give the mature 
polyketides. For many polyketides, maturation includes the addition of one or more sugar 
residues to the cyclized carbon chain. The sugar residues are frequently critical to the 
biological activity of the mature polyketide. 

Streptomyces and the closely related Saccharopolyspora genera are prodigious 
producers of polyketide metabolites. Because of the commercial significance of these 
compounds, a great amount of effort has been expended in the study of Streptomyces 
genetics. Consequently, much is known about Streptomyces and several cloning vectors exist 
for introducing DNA into these organisms. 

Although many polyketides have been identified, there remains the need to obtain 
novel glycosylation modified (as defined herein) polyketide structures with enhanced 
properties. Current methods of obtaining such molecules include screening of biological 
samples and chemical modification of existing polyketides, both of which are costly and time 
consuming. Current screening methods are based on gross properties of the molecule, i.e. 
antibacterial, antifungal activity, etc., and both a priori knowledge of the structure of the 
molecules obtained or predetermination of enhanced properties are virtually impossible. 
Standard chemical modification of existing structures has been successfully employed, but is 
limited by the number of types of compounds obtainable. Furthermore, the poor yield of 
multistep chemical syntheses often limits the practicality of this approach. The following 
modifications to sugar residues bound to polyketides are particularly difficult or inefficient at 
the present time: change the stereochemistry of specific hydroxyl or methyl groups, change 
the oxidation state of specific hydroxyl groups, and deoxygenatron of specific carbons. 
Accordingly, there exists a need to obtain molecules wherein such changes are specified and 
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performed which would represent an improvement in the technology to produce altered 
glycosylation-modified polyketide molecules with predicted structure. 

The present invention overcomes these problems by providing the genetic sequence of 
sugar biosynthesis genes involved in the biosynthesis of polyketide-associated sugars. 

Summary of the Invention 
In one aspect, the present invention provides an isolated single or double stranded 
polynucleotide, typically DNA, having a nucleotide sequence which comprises (a) a 
nucleotide sequence selected from the group consisting of (i) the sense sequence of FIG. 4A 
(SEQ ED NO:l) from about nucleotide position 54 to about nucleotide position 1 136; (ii) the 
sense sequence of SEQ ID NO: 1 from about nucleotide position 1 147 to about nucleotide 
position 2412; (iii) the sense sequence of SEQ ID NO:l from about nucleotide position 2409 
to about nucleotide position 3410 ; (iv) the sense sequence of FIG. 4B (SEQ ID NO:2) from 
about nucleotide position 80 to about nucleotide position 1048; (v) the sense sequence of 
SEQ ID NO:2 from about nucleotide position 1048 to about nucleotide position 2295; (vi) the 
sense sequence of SEQ ID NO:2 from about nucleotide position 2348 to about nucleotide 
position 3061; (vii) the sense sequence of SEQ ID NO:2 from about nucleotide position 3214 
to about nucleotide position 4677; (viii) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 4674 to about nucleotide position 5879; (ix) the sense sequence of SEQ 
ID NO:2 from about nucleotide position 5917 to about nucleotide position 7386; and (x) the 
sense sequence of SEQ ID NO:2 from about nucleotide position 7415 to about nucleotide 
position 7996; (b) sequences complementary to the sequences of (a); (c) sequences that, on 
expression, encode a polypeptide encoded by the sequences of (a); and (d) analogous 
sequences that hybridize under stringent conditions to the sequences of (a) and (b). A 
preferred molecule is a DNA molecule. In another embodiment, the polynucleotide is an 
RNA molecule. 

In another embodiment, a DNA molecule of the present invention is contained in an 
expression vector. The expression vector preferably further comprises an enhancer-promoter 
operatively linked to the polynucleotide. In a preferred embodiment, the DNA molecule in 
the vector is one of the preferred sequences mentioned above. In an especially preferred 
embodiment, the DNA molecule in the vector is the sequence of SEQ ID NO:2 from about 
nucleotide position 80 to about nucleotide position 1048. 

The present invention still further provides for a host cell transformed with a 
polynucleotide or expression vector of this invention. Preferably, the host cell is a bacterial 
ceil selected from the group consisting of Saccharopolyspora spp., Streptomyces spp. and £. 
coli. 

The present invention also provides methods to produce novel glycosylation modified 



WO 97/23630 



PCIYUS96/20238 



3 

polyketide structures by designing and introducing specified changes in the DNA governing 
the synthesis and attachment of sugar residues to polyketides. According to one method, the 
biosynthesis of specific glycosylation-modified polyketides is accomplished by genetic 
manipulation of a polyketide-producing microorganism comprising the steps of isolating a 
sugar biosynthesis gene-containing DNA sequence from those described above; identifying 
within the gene-containing DNA sequence one or more DNA fragments responsible for the 
biosynthesis of a polyketide-associated sugar or its attachment to the polyketide; creating one 
or more specified changes into the DNA fragment or fragments, thereby resulting in an 
altered DNA sequence; introducing the altered DNA sequence into a polyketide-producing 
microorganism to replace the original sequence whereby the altered DNA sequence, when 
translated, results in altered enzymatic activity capable of effecting the production of the 
specific glycosylation-modified polyketide; growing a culture of the altered polyketide- 
producing microorganism under conditions suitable for the formation of the specific 
glycosylation-modified polyketide; and isolating said specific glycosylation-modified 
polyketide from the culture. 

In a second method the biosynthesis of specific glycosylation-modified polyketides is 
accomplished by isolating a sugar biosynthesis gene-containing DNA sequence from from 
those described above; identifying within the gene-containing DNA sequence one or more 
DNA fragments responsible for the biosynthesis of a polyketide-associated sugar or its 
attachment to the polyketide; reversing the strand orientation of the DNA fragment or 
fragments, thereby resulting in an altered DNA sequence which, when transcribed, results in 
production of an antisense mRNA; introducing the altered DNA sequence into a polyketide- 
producing microorganism having an mRNA capable of binding to the antisense mRNA which 
results in altered enzymatic activity capable of effecting the production of the specific 
glycosylation-modified polyketide; growing a culture of the altered polyketide-producing 
microorganism under conditions suitable for the formation of the specific glycosylation- 
modified polyketide; and isolating the specific glycosylation-modified polyketide from the 
culture. * 

In a third method the biosynthesis of specific glycosylation-modified polyketides is 
accomplished by isolating a sugar biosynthesis gene-containing DNA sequence from from 
those described above; identifying within the gene-containing DNA sequence one or more 
DNA fragments responsible for the biosynthesis of a polyketide-associated sugar or its 
attachment to the polyketide; introducing the DNA fragment or fragments into a polyketide- 
producing microorganism whereupon transcription and translation of the DNA fragment or 
fragments generate an altered polyketide-producing microorganism that is capable of 
producing the specific glycosylation-modified polyketide; growing a culture of the 
polyketide-producing microorganism containing the DNA fragment or fragments under 
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conditions suitable for the formation of the specific glycosylation-modified polyketide; and 
isolating the specific glycosylation-modified polyketide from the culture. 

Preferably, the sugar biosynthesis gene-containing DNA sequence of the processes 
described above comprises genes which encode an enzymatic activity involved in the 
biosynthesis of L-mycarose and/or D-desosamine. More preferably, the sugar biosynthesis 
gene-containing DNA sequence comprises the sequence of SEQ ID NO:2 from about 
nucleotide position 80 to about nucleotide position 1048. 

The present invention is especially useful in manipulating sugar biosynthesis genes 
from Streptomyces and Saccharopolyspora, organisms that provide over one-half of the 
clinically useful antibiotics. 

Brief Description of the Drawings 
FIG. 1 A illustrates the organization of the erythromycin biosynthetic gene cluster and 
the genetic designations of the biosynthetic genes; FIG. IB illustrates an abbreviated 
erythromycin biosynthetic scheme that broadly associates the biosynthetic genes with their 
role in erythromycin biosynthesis. Seven eryB genes, eryBI - eryBVII, are responsible for the 
biosynthesis of L-mycarose or its attachment to the erythronolide B ring, and six eryC genes, 
eryCl - eryCVI, are responsible for the biosynthesis of D-desosamine or its attachment to 3- 
a-mycarosylerythronolide B. The dashed arrows indicate that the pathway through 
erythromycin B is not the principal natural biosynthetic route to erythromycin A. 

FIG. 2 illustrates the proposed scheme for the biosynthesis of L-mycarose and the 
eryB genes responsible for the specific steps. 

FIG. 3 illustrates the proposed scheme for the biosynthesis of D-desosamine and the 
eryC genes responsible for the specific steps. 

FIG.*4A(I-4) illustrates the nucleotide sequence (SEQ ID NO:l) of the sugar 
biosynthesis genes eryCII (coordinates 54-1 136), eryCIII (coordinates 1 147-2412), and 
eryBII (coordinates 2409-3410), with corresponding translation of the open reading frames 
(SEQ ID NO:3, SEQ ID NO:4 and SEQ ID NO:5 respectively). Standard one letter codes for 
the amino acids appear beneath their respective nucleic acid codons as described herein. 

FIG. 4B(l-9) illustrates the nucleotide sequence (SEQ ID NO:2) of the sugar 
biosynthesis genes eryBIV (coordinates 80-1048), eryBV (coordinates 1048-2295), eryCVI 
(coordinates 2348-3061), eryBVI (coordinates 3214-4677), e ry CIV ^coordinates 4674-5879), 
eryCV (coordinates 5917-7386), and eryBVII (coordinates 7415-7996) with corresponding 
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translation of the putative open reading frames (SEQ ID NO:6, SEQ ID NO:7, SEQ ID 
NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 1 1 and SEQ ED NO: 12 respectively). 
Standard one letter codes for the amino acids appear beneath their respective nucleic acid 
codons as described herein. 

5 

FIG. 5A illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryBIV gene of Sac. erythraea (SEQ ID NO:6) and the sugar 
biosynthesis enzymes encoded by the ascF gene of Yersinia pseudotuberculosis [Thorson et 
aL, /. BacterioL, 176:5483 (1994)], (SEQ ID NO: 13), the rfbJ gene of Salmonella enterica 

10 [Jiang et at, Mol Microbiol, 5:695 (1991)]. (SEQ ID NO: 14), the strL gene of Streptomyces 
griseus [Pissowotzki et aL, Mol Gen. Genet. 241:193 (1993)] (SEQ ID NO: 15) and the galE 
gene of Escherichia coli [Lemaire and Hill, Nuci Acids Res. 14:7705 (1986)] (SEQ ID 
NO: 16). In this and all other Figures in which amino acid sequence identity is compared 
capitalized letters represent consensus (identical) amino acids between species or amino acids 

15 which are conservative substitutions for the consensus residues. Also in each Figure, the 

sequence identified as "consensus" is merely a convenient representation of conserved amino 
acids and is not intended as a representation of any existing polypeptide sequence. 

FIG. 5B illustrates the amino acid sequence identity between the sugar biosynthesis 
20 enzyme encoded by the eryBVII gene of Sac. erythraea (SEQ ID NO: 12) and the sugar 

biosynthesis enzymes encoded by the strM gene of Streptomyces griseus [Pissowotzki et aL, 
Mol. Gen. Genet. 241:193 (1993)] (SEQ ID NO: 17), the rfbC gene of Salmonella enterica 
[Jiang et aL, Mol Microbiol., 5:695 (1991)] (SEQ ID NO: 18), the rfbF gene of Yersinia 
entercolitica [Zhang et aL, Mol Microbiol, 9:309 (1993)] (SEQ ID NO: 19), and the ascE 
25 gene of Yersinia pseudotuberculosis [Thorson et al y J. BacterioL, 176:5483 (1994)] (SEQ ID 
NO:20). 

FIG. 5C illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryCIV gene of Sac. erythraea (SEQ ID NO: 10) and the sugar 

30 biosynthesis enzymes encoded by the eryCI gene of Sac. erythraea [Dhillon et aL, Mol. 

Microbiol, 3: 1405 (1989)] (SEQ ID NO:21), the ascC gene of Yersinia pseudotuberculosis 
[Weigel etal.. Biochemistry, 31:2129 (1992), Thorson etal,J. Am. Chem. Soc, 1 15:6993 
(1993), Thorson et aL, J. BacterioL, 176:5483 (1994)] (SEQ ID NO:22), the dnrJ gene of 
Streptomyces peucetius [Stutzman-Engwall et aL, 7. BacterioL, 174: 144 (1992)] (SEQ ED 

35 NO:23), the prgl gene of Streptomyces alboniger [Lacalle et aL, EMBO J., 1 1:785 (1992)] 
(SEQ ID NO:24), and the strS gene of Streptomyces griseus [DiStler et aL, Gene, 1 1 5: 105 
(1992)] (SEQ ID NO:25). 
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FIG. 5D illustrates the amino acid sequence identity between the sugar biosynthesis 
enzymes encoded by the eryBV and erydll genes of Sac. erythraea (SEQ ED NO:7 and SEQ 
ID NO:4 respectively) and the sugar biosynthesis enzyme encoded by the dnrS gene of 
Streptomyces peucetius [Otten et al, J, Bacterioi, 177:6688 (1995)] (SEQ ID NO:26). 

FIG. 5E illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryCV! gene of Sac. erythraea (SEQ ID NO: 8) and the sugar 
biosynthesis enzymes encoded by the srmX gene of Streptomyces ambofaciens [Geistlich et 
ai, MoL Microbiol, 6:2019 (1992)] (SEQ ID NO:27), the rdmD gene of Streptomyces 
purpurascens [GenBank Accession: U 10405] (SEQ ID NO:28) and the glycine 
methyltransferase of Rattus norvegious [Ogawa et aL, Eur. J. Biochem. 168: 141 (1987)] 
(SEQ ID NO:29). 

FIG. 6A through 6D illustrate the compounds conceivably formed in Examples 1-4 
respectively and are representative of compounds formed from Type 1=(FIG 6A), Type II 
(FIG. 6B), and Type III (FIGS. 6C and 6D) alterations. 

FIG. 7 illustrates the construction of the expression plasmid pASX2 described in 
Example 2. For FIGS 7-13 the following abbreviations have been used: amp, ampicillin 
resistance gene; tsr, thiostrepton resistance gene; ROP, repressor of plasmid synthesis gene; 
eryBI, eryBII, eryBIU, eryBIV, eryBV, eryBVI, eryBVII, eryCl, eryCII, eryCIII, eryCIV, 
eryCV, and eryCVI, the erythromycin biosynthetic genes involved in the synthesis of 
mycarose or its attachment to the macrolide ring (eryB) or the synthesis of desosamine or its 
attachment to the macrolide ring (eryC) [the thin arrows above a gene indicate its relative size 
and the direction of transcription]; ori-£. coli, an origin of DNA replication that functions in 
E. coli, in the specific examples the ColEl origin; ori-Streptomyces, an origin of DNA 
replication that functions in Streptomyces, in the specific examples the pJVl origin [Servin- 
Gonzalez et ai, Microbiology, 141:2499 (1995)]; p-ermE*, a modified promoter for the 
erythromycin resistance gene; t-fd, the gene VIII transcription terminator of bacteriophage fd; 
PCR, polymerase chain reaction; Restriction enzyme sites have been indicated by their 
standard commercial names (i.e. BamHl, EcoRl, etc). The abbreviations appended to the 
large arrows in the plasmid synthetic schemes summarize each of the steps involved the 
plasmid constructions. These steps are described fully in the relevant Examples. 

FIG. 8 illustrates the construction of the eryBVII antisense -expression plasmid 
pASB VII described in Example 2. 
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FIG. 9A illustrates the construction of the carrier plasmid pKl. 

FIG. 9B-E illustrates the construction of plasmid pKB6 which carries all of the eryB 
genes and is described in Example 3. 

FIG. 10 illustrates the construction of expression plasmid pXl described in Example 

3. 

FIG. 1 I illustrates the construction of the eryB expression plasmids pXSB6 and pXB6 
described in Example 3. 

FIG. 12A-B illustrate the construction of plasmid pKC4 which carries all of the eryC 
genes described in Example 4. 

FIG. 13 illustrates the construction of the eryC expression plasmids pXSC4 and pXC4 
described in Example 4. 

Detailed Description of the Invention 

I. The Invention 

The present invention provides isolated and purified polynucleotides that encode 
enzymes or fragments thereof responsible for the biosynthesis of polyketide-associated sugars 
or their attachment to polyketides, vectors containing those polynucleotides, host cells 
transformed with those vectors, a process of making novel glycosylated polyketides using 
those polynucleotides and vectors, and isolated and purified recombinant polypeptides and 
polypeptide fragments thereof. 

II. Definitions 

For the purposes of the present invention as disclosed and claimed herein, the 
following terms are defined. 

The term "polyketide" as used herein refers to a large and diverse class of natural 
products, including but not limited to antibiotic, antifungal, anticancer, and anti-helminthic 
compounds. Antibiotics include, but are not limited to anthracyclines and macrolides of 
different types (polyenes and avermectins as well as classical macrolides such as 
erythromycins). 

The term "glycosylated polyketide" refers to any polyketide that contains one or more 
sugar residues. 
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The term "glycosylation-modified polyketide" refers to a polyketide having a changed 
glycosylation pattern or configuration relative to that particular polyketide's unmodified or 
native state. 

The term 11 polyketide-producing microorganism" as used herein includes any 
microorganism that can produce a polyketide naturally or after being suitably engineered (i.e. 
genetically). Examples of actinomycetes and the polyketides they naturally produce include 
but are not limited to those listed in Table 1 below (see Hopwood, D.A. and Sherman, D.H., 
Anna, Rev. Genet., 24:37-66 (1990) incorporated herein by reference). 

Table 1 



Organism 


Polyketide Produced 


Saccharopolyspora erythraea 


Erythromycin 


Streptomyces ambofaciens 


Spiramycin 


Streptomyces avermitilis 


Avermectin 


Streptomyces fradiae 


Tylosin 


Streptomyces g rise us 


Candicidin, monactin, griseusin 


Streptomyces violaceoniger 


Granaticin 


Streptomyces thermotolerans 


Carbomycin 


Streptomyces rimosus 


Oxytetracycline 


Streptomyces peucetius 


Daunorubicin 


Streptomyces coeiicolor 


Actinorhodin 


Streptomyces glaucescens 


Tetracenomycin 


Streptomyces roseofulvus 


Frenolicin 


Streptomyces cinnamonensis 


Monensin 


Streptomyces curacoi 


Curamycin 


Amycolatopsis mediterranei 


Rifamycin 



Other examples of polyketide-producing microorganisms that produce polyketides 
naturally include various Actinomadura t Dactylo sporangium and Nocardia strains. 

The term "sugar biosynthesis genes' 1 as used herein refers to sequences of DNA from 
Saccharopolyspora erythraea that encode sugar biosynthesis enzymes and is intended to 
include sequences of DNA from other polyketide-producing microorganisms which are 
identical or analogous to those obtained from Saccharopolyspora erythraea. 

The term "sugar biosynthesis enzymes" as used herein refers to polypeptides which 
are involved in the biosynthesis and/or attachment of polyketide-associated sugars and their 
derivatives and intermediates. 
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The term "polyketide-associated sugar" refers to a sugar that is known to attach to 
polyketides or that can be attached to polyketides by the processes described herein. 

The term "sugar derivative" refers to a sugar which is naturaJIy associated with a 
polyketide but which is altered relative to the unmodified or native state; examples only 
include N-3-a-desdimethyl D-desosamine, D-mycarose, 4-keto-L-mycarose, 4-keto-D- 
mycarose, 3-desmethyl L-mycarose and 3-desmethyl D-mycarose. 

The term "sugar intermediate" refers to an intermediate compound produced in a 
sugar biosynthesis pathway. 

The term "eryB" as used herein refers to sequences of DNA that encode enzymes 
involved specifically in the biosynthesis of the deoxysugar L-mycarose. 

The term "eryC" as used herein refers to sequences of DNA that encode enzymes 
involved specifically in the biosynthesis of the deoxysugar D-desosamine. 

III. Polynucleotides 

The organization of the segment of the Saccharopolyspora erythraea (Sac. erythraed) 
chromosome that determines the biosynthesis of erythromycin and the corresponding genes 
that determine the biosynthesis of the sugars L-mycarose and D-desosamine, designated 
eryB and ervC, respectively, are shown in FIG. 1 A. It is seen that several genes are required 
for the biosynthesis of each of the sugars and that these genes are interspersed among one 
another. It is predicted that each gene encodes an enzyme that catalyzes one or a few steps in 
the biosynthesis of L-mycarose or D-desosamine from thymidine diphospho-4-keto-6 
deoxyglucose (TDP-glucose); these steps are outlined in FIG. 2 and FIG. 3. In the case of L- 
mycarose, (shown in FIG. 2), these steps include: (1) C-2" deoxygenation , (2) C-2'7C-3" 
enoyl reduction, (3) C-5" epimerization, (4) C-3" C-methylation, (5) C-4" keto reduction, and 
(6) transfer to erythronolide B. For D-desosamine, shown in FIG. 3, these steps comprise (I) 
C-473' isomerization, (2, 3) C-3' deoxygenation and reduction, (4) C-3' amination, 
(5, 6) N-3a' N-dimethylation, and transfer to mycarosyl erythronolide B. 

This classification of genes (as belonging to either the eryB class or eryC class) was 
determined by first altering the wild type genes of interest in an erythromycin producing 
strain (i.e. in vivo) to inactivate their expression. The erythromycin products resulting from 
such alterations were then analyzed. Genes whose alterations caused an accumulation of 
erythronolide B (indicating a lack of L-mycarose, or failure to attach L-mycarose to the 
erythronolide ring) were classified as eryB genes; genes whose alterations caused an 
accumulation of 3-a-L-mycarosyl erythronolide B (indicating a lack of D-desosamine, or 
failure to attach D-desosamine to the 3-a-L-mycarosyl erythronolide B ring) were classified 
as eryC genes. Accordingly, it should be noted that all such genes identified herein as eryB 
or eryC are involved in the synthesis of L-mycarose or D-desosamine. The predicted 
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functional activities of the polypeptides encoded by eryB and eryC will be discussed in 
further detail below. 

In one aspect then, the present invention provides isolated and purified eryB and eryC 
polynucleotides from Sac. erythraea that encode enzymes involved in the production of 
glycosylated polyketides. A polynucleotide of the present invention that encodes a sugar 
biosynthesis enzyme is an isolated single or double stranded polynucleotide having a 
nucleotide sequence which comprises (a) a nucleotide sequence selected from the group 
consisting of (i) the sense sequence of FIG. 4 A (SEQ ID NO:l) from about nucleotide 
position 54 to about nucleotide position 1 136; (ii) the sense sequence of SEQ ID NO: 1 from 
about nucleotide position 1 147 to about nucleotide position 2412; (iii) the sense sequence of 
SEQ ID NO: 1 from about nucleotide position 2409 to about nucleotide position 3410 ; (iv) 
the sense sequence of FIG. 4B (SEQ ID NO:2) from about nucleotide position 80 to about 
nucleotide position 1048; (v) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 1048 to about nucleotide position 2295; (vi) the sense sequence of SEQ ID NO:2 
from about nucleotide position 2348 to about nucleotide position 3061; (vii) the sense 
sequence of SEQ ID NO:2 from about nucleotide position 3214 to about nucleotide position 
4677; (viii) the sense sequence of SEQ ID NO:2 from about nucleotide position 4674 to 
about nucleotide position 5879; (ix) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 5917 to about nucleotide position 7386; and (x) the sense sequence of 
SEQ ID NO:2 from about nucleotide position 7415 to about nucleotide position 7996; 

(b) sequences complementary to the sequences of (a), 

(c) sequences that, when expressed, encode polypeptides encoded by the sequences of 

(a), and 

(d) analogous sequences that hybridize under stringent conditions to the sequences of 

(a). 

A preferred polynucleotide is a DNA molecule. In another embodiment, the polynucleotide 

is an RNA molecule. 

The nucleotide sequence and deduced amino acid residue sequences of the sugar 
biosynthesis genes are set forth in FIG. 4A(l-4) and FIG. 4B(l-9). The nucleotide sequences 
of FIG. 4A(l-4) (SEQ ID NO: 1) and FIG. 4B(l-9) (SEQ ID NO:2) represent full length DNA 
clones of the sense strand of two distinct clusters of sugar biosynthesis genes and are 
intended to represent both the sense strand (shown on top) and its complement. The amino 
acid sequences depicted below the sense strand correspond to polypeptides encoded by a 
nucleotide sequence selected from the group consisting of (i) the sense strand of SEQ ID 
NO: 1 from about nucleotide position 54 to about nucleotide position 1 136 (ii) the sense 
sequence of SEQ ID NO: 1 from about nucleotide position 1 147 to about nucleotide position 
2412, (iii) the sense sequence of SEQ ID NO: 1 from about nucleotide position 2409 to about 
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nucleotide position 3410, (iv) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 80 to about nucleotide position 1048, (v) the sense sequence of SEQ ID NO:2 from 
about nucleotide position 1048 to about nucleotide position 2295, (vi) the sense sequence of 
SEQ ED NO:2 from about nucleotide position 2348 to about nucleotide position 3061, (vii) 
the sense sequence of SEQ ID NO:2 from about nucleotide position 3214 to about nucleotide 
position 4677, (ix) the sense sequence of SEQ ID NO:2 from about nucleotide position 5917 
to about nucleotide position 7386 and (x) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 7415 to about nucleotide position 7996. The polypeptides encoded by the 
nucleotide sequences of (i)-(x) above are set forth as SEQ ID NO:3-SEQ ID NO: 12 
respectively. 

The present invention also contemplates analogous DNA sequences which hybridize 
under stringent hybridization conditions to the DNA sequences set forth above. Stringent 
hybridization conditions are well known in the art and define a degree of sequence identity 
greater than about 80%-90%. The modifier "analogous" refers to those nucleotide sequences 
that encode analogous polypeptides (i.e. in relation to a sugar biosynthesis enzyme), 
analogous polypeptides being those which have only conservative differences and which 
retain the conventional characteristics and activities of sugar biosynthesis enzymes. (A more 
detailed description of analogous polypeptides is provided below). The present invention 
also contemplates naturally occurring allelic variations and mutations of the DNA sequences 
set forth above so long as those variations and mutations code, on expression, for a sugar 
biosynthesis gene of this invention as set forth hereinafter. 

As is well known in the art, because of the degeneracy of the genetic code, there are 
numerous other DNA and RNA molecules that can code for the same polypeptides as those 
encoded by the aforementioned sugar biosynthesis genes and fragments thereof. The present 
invention, therefore, contemplates those other DNA and RNA molecules which, on 
expression, encode the polypeptides of SEQ ID NO:3-SEQ ID NO: 1 1 or fragments thereof. 
Having identified the amino acid residue sequence encoded by a sugar biosynthesis gene, and 
with knowledge of all triplet codons for each particular amino acid residue, it is possible to 
describe all such encoding RNA and DNA sequences. DNA and RNA molecules other than 
those specifically disclosed herein and, which molecules are characterized simply by a 
change in a codon for a particular amino acid, are within the scope of this invention. 

The 20 common amino acids and their representative abbreviations, symbols and 
codons are well known in the art (see for example, Molecular Biology of the Cell, Second 
Edition, B. Alberts et aL, Garland Publishing Inc., New York and London, 1989). As is also 
well known in the art, codons constitute triplet sequences of nucleotides in mRNA molecules 
and as such, are characterized by the base uracil (U) in place of base thymidine (T) which is 
present in DNA molecules. A simple change in a codon for the same amino acid residue 
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within a polynucleotide will not change the structure of the encoded polypeptide. By way of 
example, it can be seen from SEQ ID NO: 1 that an AGC codon for serine exists at nucleotide 
positions 126-128 and again at positions 420-422 and 561-563. However, it can also be seen 
from that same sequence that serine can be encoded by a TCG codon (see eg. nucleotide 
positions 192-194) and a TCC codon (see e.g., nucleotide positions 204-206). Substitution of 
the latter codons for serine with the AGC codon for serine, or visa versa, does not 
substantially alter the DNA sequence of SEQ ID NO: 1 and results in production of the same 
polypeptide. In a similar manner, substitutions of the recited codons with other equivalent 
codons can be made in a like manner without departing from the scope of the present 
invention. 

A polynucleotide of the present invention can also be an RN A molecule. An RNA 
molecule contemplated by the present invention is complementary to or hybridizes under 
stringent conditions to any of the DNA sequences set forth above. Exemplary and preferred 
RNA molecules are mRNA molecules that encode sugar biosynthesis enzymes of this 
invention. 

IV. Polypeptides 

In another aspect, the present invention provides polypeptides which are reasonably 
believed to be sugar biosynthesis enzymes. A sugar biosynthesis enzyme of the present 
invention is a polypeptide of about 21 kdal to about 47 kdal. As set forth in FIG. 5 A-5E, 
analogs of the predicted polypeptides encoded by certain eryB and eryC genes have been 
identified in various species and their sequences compared using the PRETTY routine 
(Genetics Computer Group (GCG) Sequence Analysis Software Package, Madison, WI). 
Due to the degree of amino acid sequence identity existing between the polypeptides of these 
other sugar biosynthesis genes and the polypeptides encoded by the eryB and eryC genes, 
certain enzymatic activities can reasonably be attributed to the eryB and eryC polypeptides. 

By way of example, analogs of the polypeptide encoded by the eryB/Vgene have 
been identified in Yersinia pseudotuberculosis, Salmonella enterica, Streptomyces griseus and 
Escherichia coli (see FIG. 5A). The various analogs have been identified with from 290-328 
amino acid residues and are characterized by a low degree of amino acid sequence identity. 
(For example, the identity between the sugar biosynthesis enzyme encoded by the eryBIV 
gene of Sac, erythraea and the sugar biosynthesis enzyme encoded by the galE gene of E. 
coli is 20% at the amino acid level). However, a conserved amino acid sequence motif, G x x 
G x x G (where G represents the amino acid glycine and x represents any other amino acid 
residue) is found within the first 30 amino acid residues of all analogs shown. Since the 
polypeptide encoded by the galE gene has been shown to be an epfmerase (whose mechanism 
includes a ketoreduction (Bauer et a/., Proteins 12:372 (1992)), the eryBIV gene product is 
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reasonably predicted to be a ketoreductase. 

As set forth in FIG. 5B analogs of the sugar biosynthesis enzyme encoded by the 
eryBVIl gene have been identified in Streptomyces griseus Salmonella enterica, Yersinia 
entercolitica and Yersinia pseudotuberculosis. The various analogs have been identified with 
from 183-200 amino acid residues and are characterized by a moderate degree of amino acid 
identity. By way of example, the identity at the amino acid level between the sugar 
biosynthesis enzyme encoded by the eryBVIl gene of Sac. erythraea and the sugar 
biosynthesis enzyme encoded by the rfbC gene of Salmonella enterica or the strM gene of 
Streptomyces griseus is 37% and 61%, respectively. Furthermore, a common characteristic 
of these particular polypeptides (including that of eryBVIl), is that they are only associated 
with L-sugar biosynthesis and not with D-sugar biosynthesis. Thus the gene product of 
eryBVIl is reasonably predicted to function as a C-5 epimerase which converts the 
stereochemistry of the sugar from the "D' 1 configuration to the "L" configuration. 

As set forth in FIG. 5C analogs of the sugar biosynthesis enzyme encoded by the 
eryCIV gene have been identified in Sac. erythraea and Yersinia pseudotuberculosis. As set 
forth in FIG. 5C, the predicted amino acid sequences of the protein products of eryCI and 
eryCIV share 34% sequence identity to each other, 27% and 25% respectively to the 
predicted amino acid sequence encoded by ascC from Yersinia pseudotuberculosis. The 
enzyme encoded by ascChas been shown to remove a hydroxy! group located at the C-3 
position of L-ascarylose (Liu and Thorson, Annu. Rev. Microbiol. 48:223 (1994)). Thus, at 
least one of the polypeptides encoded by eryCI or eryCIV is predicted to be an enzyme which 
functions in deoxygenation reactions. 

Furthermore, the enzyme encoded by the ascC gene requires the biochemical cofactor 
pyridoxamine, which is the same cofactor used in biochemical transamination reactions. 
Consequently, it has been proposed that some protein analogs (such as dnrJ from 
Streptomyces peucetius, prg 1 from Streptomyces alboniger and strs from Streptomyces 
griseus) having a moderate degree of sequence similarity to the polypeptide encoded by ascC 
function as transaminases in amino sugar biosynthesis (Thorson et al. y J. Am. Chem. Soc. 
1 15:6993 (1993)). Since the biosynthesis of D-desosamine requires both deoxygenation and 
transamination, it is reasonable to predict that at least one of the polypeptides encoded by the 
eryCI or eryCIV genes functions in transamination reactions. 

As set forth in FIG. 5D the predicted polypeptides encoded by eryBV and eryCIII 
share 43% identity at the amino acid level and as such, may be assumed to have similar 
activities with respect to their particular sugars. However, as shown in FIGS. 2 and 3, there 
are no common steps in the proposed pathways of L-mycarose and D-desosamine 
biosynthesis. Rather than having similar sugar biosynthesis functions, these polypeptides are 
predicted to be nucleotidyl-sugar transferases which, (in Sac. erythraea at least), function to 
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attach L-mycarose and D-desosamine to erythronolide B and 3-a-mycarosyierythronolide B, 
respectively. 

As set forth in FIG. 5E analogs of the polypeptide encoded by the eryCVI gene have 
been identified in Streptomyces ambofaciens , Streptomyces purpurascens, and Rattus 
norvegicus. The various analogs have been identified with from 237-293 amino acid residues 
and are characterized by a low to moderate degree of amino acid identity. By way of 
example, the identity between the polypeptide encoded by the eryCVI gene of Sac. erythraea 
and the glycine methyltransferase of Rattus norvegicus is 26% at the amino acid level. 
Furthermore these sugar biosynthesis enzymes share a common sequence motif, 
LDVACGTG (SEQ ID NO:30 = amino acid positions 64-71 in the consensus sequence in 
FIG. 5E), with rat glycine methyltransferase whose biochemical function is known (Ogawa et 
al. % Eur. J. Biochem. 168: 141 (1987)). Thus these polypeptides are predicted to be N- 
methy ltransferases. 

In another aspect, the present invention provides a recombinant C-4" keto reductase 
from Sac. erythraea . A recombinant Sac. erythraea C-4" ketoreductase of the present 
invention is a polypeptide of about 322 or less amino acid residues. A preferred recombinant 
Sac. erythraea C-4" ketoreductase is that encoded by the nucleotide sequence of SEQ ID 
NO:2 from about nucleotide position 80 to about nucleotide position 1048. 

The present invention also contemplates amino acid residue sequences that are 
substantially duplicative of the sequences set forth herein such that those sequences 
demonstrate like biological activity to disclosed sequences. Such contemplated sequences 
include those analogous sequences characterized by a minimal change in amino acid residue 
sequence or type (e.g., conservatively substituted sequences) which insubstantial change does . 
not alter the fundamental nature and biological activity of the aforementioned sugar 
biosynthesis enzymes. 

It is well known in the art that modifications and changes can be made in the structure 
of a polypeptide without substantially altering the biological function of that peptide. For 
example, certain amino acids can be substituted for other amino acids in a given polypeptide 
without any appreciable loss of function. In making such changes, substitutions of like amino 
acid residues can be made on the basis of relative similarity of side-chain substituents, for 
example, their size, charge, hydrophobia ty, hydrophilicity, and the like. 

As detailed in United States Patent No. 4,554,101, incorporated herein by reference, 
the following hydrophilicity values have been assigned to amino acid residues: Arg (+3.0); 
Lys (+3.0); Asp (+3.0); Glu (+3.0); Ser (+0.3); Asn (+0.2); Gin (+0.2); Gly (0); Pro (-0.5); 
Thr (-0.4); Ala (-0.5); His (-0.5); Cys (-1.0); Met (-1.3); Val (-1.5); Leu (-1.8); He (-1.8); Tyr 
(-2.3); Phe (-2.5); and Trp (-3.4). It is understood that an amino acid residue can be 
substituted for another having a similar hydrophilicity value (e.g., within a value of plus or 
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minus 2,0) and still obtain a biologically equivalent polypeptide. 

In a similar manner, substitutions can be made on the basis of similarity in 
hydropathic index. Each amino acid residue has been assigned a hydropathic index on the 
basis of its hydrophobicity and charge characteristics. Those hydropathic index values are: 
He (+4.5); Val (+4.2); Leu (+3.8); Phe (+2.8); Cys (+2.5); Met (+1.9); Ala (+1.8); Gly (-0.4); 
Thr (-0.7); Ser (-0.8); Trp (-0.9); Tyr (- 1 .3); Pro (- 1 .6); His (-3.2); Glu (-3.5); Gin (-3.5); Asp 
(-3.5); Asn (-3.5); Lys (-3.9); and Arg (-4,5). In making a substitution based on the 
hydropathic index, a value of within plus or minus 2.0 is preferred. 

V. Production of novel glycosylated polyketides 

In another aspect, the present invention comprises a general procedure for producing 
novel polyketide structures in vivo by selectively altering, inactivating, or augmenting the 
genetic information of the organism that naturally produces a related polyketide. That is, in 
the present invention, novel polyketides of desired structure are produced by manipulation of 
the eryB and/or eryC genes followed by their introduction into various polyketide-producing 
microorganisms. These manipulations result in the formation of "glycosylation-modified" 
polyketides (i.e. polyketides having an altered glycosylation pattern or configuration relative 
to their native state). For example, "glycosylation-modified" polyketides are those which 
have additional sugar groups attached (where none previously existed), different sugars (such 
as sugar intermediates) attached in place of the natural sugars or lack sugar groups (at 
positions where sugar groups previously existed). 

In the case of Type I and Type II alterations (further described below) glycosylation- 
modified polyketides may arise though mechanisms which cause either (1) the non- 
production of the sugar attachment enzyme (i.e. the enzyme involved in attachment of a sugar 
to the the polyketide structure) or (2) the non-production of a sugar biosynthesis enzyme. In 
the first instance, the sugar will not be attached to the polyketide since the enzyme which 
functions to attach the sugar will be lacking. In the second situation, a sugar intermediate 
from the biosynthesis pathway will be produced (depending on which enzyme is lacking) and 
attached to the polyketide provided it is recognized as a suitable substrate by the sugar 
attachment enzyme; alternatively, it will not be recognized and therefore, not attached. In the 
case of Type III alterations (also described in detail below), glycosylation-modified 
polyketides arise via attachment of additional or different sugars (i.e. not normally found in a 
particular polyketide-producing strain) to the polyketide. It should be noted, that these 
postulated mechanisms are simply provided to enhance understanding of the novel processes 
described herein; the actual mechanisms by which the Type I, II and III alterations produce 
glycosylation-modified polyketides is not presently known. 

In the first type of alteration (referred to herein as Type I alterations), genetically 
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altered eryB and/or eryC genes are introduced into the chromosome of Sac. erythraea or 
another glycosylated polyketide-producing organism that also produces L-mycarose, D- 
desosamine, or their closely related derivatives such as mycaminose (4-hydroxy D- 
desosamine). The genetic alteration of an eryB and/or eryC gene is such that it causes a non- 
functional enzyme to be synthesized. Once introduced into an appropriate strain, the altered 
gene replaces its corresponding wild type gene causing the strain to lose the ability to 
produce a particular enzymatic activity involved in sugar biosynthesis. As a result, a 
giycosylation-modified polyketide is produced via either of the mechanisms previously 
described for a Type I alteration. 

In a Type I change described herein, a specific mutation in an eryB and/or eryC gene 
of the Sac. erythraea chromosome is accomplished by a three step process which involves: 
I ) specifically altering the DNA sequence of a desired sugar biosynthesis gene, 2) subcloning 
the altered sequence into a suitable vector capable of recombining in the chromosome of an 
appropriate host and 3) introducing the vector containing the subcloned sequence into the 
appropriate host so that exchange of the wild type allele with the mutated one will occur. The 
first step is accomplished using standard recombinant DNA techniques to effect a deletion, 
base pair conversion or frame-shift in the DNA sequence. The second step, which also 
employs standard recombinant techniques, involves subcloning the altered sequence into a 
vector which does not replicate in Sac. erythraea or the desired host. In the final step, the 
vector is introduced into a suitable host, where by che process of gene replacement, the 
altered allele replaces the wild-type one. All techniques employed in a Type I change are 
well known to those of ordinary skill in the art. 

Example 1 illustrates the process of gene replacement of an eryB gene. As Example 1 
shows, the eryB gene of interest is mutated and along with adjacent upstream and 
downstream DNA sequences, cloned into a non-replicating Sac. erythraea plasmid vector. 
The vector carrying the mutated allele and adjoining DNA is then introduced into the host 
strain by the process of protoplast transformation. Transformants are regenerated under 
selective conditions (i.e. conditions that require expression of a particular plasmid marker) in 
order to induce recombination of the plasmid into the host cell chromosome. In other words, 
since the plasmid does not replicate autonomously, it must reside in the chromosome to be 
maintained in the cell and to express a particular marker under selective conditions. Insertion 
is achieved when the regenerated cells undergo a single homologous recombination between 
one of the two DNA segments that flank the mutation on the plasmid and its homologous 
counterpart in the chromosome. The cells are then grown without selection for the marker 
which induces plasmid loss from the chromosome. This loss arises after the cells have 
undergone a second recombination between the second DNA segment that flanks the 
mutation and its homologous chromosomal counterpart. This second recombinational event 
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results in the loss of the plasmid sequences and the wild type allele from the chromosome; the 
mutant allele however is retained. 

In a variation of a Type I change, the non-production of the sugar biosynthesis 
enzyme (or attachment enzyme) may be achieved by the alternative mechanisms of promoter 
inactivation and/or transcriptional terminator insertion. These variations do not effect the 
gene sequence itself but rather regulatory mechanisms involved in gene transcription. 
"Promoter" as used herein refers to that region of a DNA molecule which controls the 
initiation of RNA transcription. Such regions are known to bind RNA polymerases (i.e. the 
enzymes involved in synthesizing RNA molecules). This form of Type I change (i.e. 
promoter inactivation) involves two steps of 1) identifying the promoter region of the desired 
gene and 2) rendering the promoter region inoperable by mutation. As in the replacement 
mechanism described above such mutations may be effected by creating deletions in the 
promoter sequence or by base pair conversion. In the case where the promoter controls 
transcription of a single gene, inactivation of the promoter will eliminate expression of that 
particular gene; of course, where the promoter controls expression of an entire operon (i.e. a 
series of genes whose expression is controlled by a single promoter), promoter inactivation 
will effectively eliminate expression of all genes in that operon. 

In a similar manner, the non-production of a sugar biosynthesis enzyme (or 
attachment enzyme) may arise from inserting a transcriptional terminator upstream from the 
gene to be inactivated. A "transcriptional terminator" as used herein is a nucleotide sequence 
which signals RNA polymerase to cease transcription. An example of a transcriptional 
terminator is a palindromic sequence capable of forming a stem-loop structure that is 
followed by a stretch of U residues (for example the transcriptional terminator that follows 
gene VIII of bacteriophage fd (Beck and Zink, Gene, 16:35 (1981)). Effecting a change in 
production of a sugar biosynthesis gene by this process involves 1) identifying of the gene or 
genes of interest (in the case of an operon arrangement) to be inactivated and 2) cloning a 
transcriptional terminator sequence in a region of the DNA upstream from such gene(s). A 
transcriptional terminator will cause the polymerase involved in RNA transcription to stop (at 
or near the signaling region) thereby preventing transcription of any downstream sequences. 
Thus, changes such as promoter inactivation and transcriptional insertion, which directly 
effect expression of sugar biosynthesis genes are also intended to be within the scope of the 
invention. 

In the second case (referred to herein as Type II alterations) eryB and/or eryC genes 
are arranged on a vector in an antisense orientation relative to a promoter capable of allowing 
expression of the gene in Sac. erythraea or Streptomyces. The vector is then introduced into 
a polyketide producing microorganism. As a result of this vector construction, antisense 
messenger RNA (mRNA) is produced which interferes with the translation of the wild-type 
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mRNA. Similarly to the Type I manipulation, novel glycosylation modified polyketides will 
be produced in which the normal mycarose, desosamine, and/or closely related sugar residue 
is lacking or is substituted by a sugar intermediate. 

In a Type II change, inactivation of the eryB and/or eryC genes by antisense 

5 expression is accomplished by a two step procedure in which (1) a specific sugar biosynthesis 
gene is subcloned into an expression vector in an antisense (i.e. reverse) orientation; and (2) 
the anti-sense expression vector is introduced into the desired strain. The first step is 
accomplished using standard recombinant DNA techniques employing either E. coli or 
Streptomyces as the host, and an expression vector (capable of replicating in either host) that 

10 can be assembled to contain a Streptomyces promoter. Streptomyces promoters may be 
obtained from any commercially available Streptomyces plasmids or Streptomyces- E. coli 
shuttle plasmids. In step 2, the anti-sense expression vector is introduced into a suitable 
Streptomyces strain and the transformed cells are grown under selective conditions in order to 
maintain the expression palsmid in the cell. 

15 As described in Example 2, the gene to be inactivated is subcloned in its reverse 

orientation downstream of a Streptomyces promoter (which is contained within a replicating 
Sac. erythraea plasmid). The plasmid carrying the antisense gene is then introduced into the 
host strain by protoplast transformation. Transformants are regenerated under selective 
conditions in order to maintain the autonomously replicating plasmid in the cells. Subsequent 

20 expression of the antisense gene causes the production of an antisense messenger RNA 
(mRNA) that is complementary to the mRNA of the native allele of the selected gene. 
Through standard nucleotide base pair interactions, the antisense mRNA and the native 
mRNA form an RNA duplex that occludes the ribosome binding site of the native mRNA. 
This interaction prevents ribosomal translation of the native mRNA and the corresponding 

25 synthesis of the enzyme encoded by that mRNA. In this way, specific enzymatic steps in 
sugar biosynthesis corresponding to the identity of the gene expressed in the antisense 
orientation are blocked leading to the production of novel sugar intermediates which, when 
attached to the polyketide ring of the host microorganism, give rise to novel glycosylation- 
modified polyketides. Alternatively, the antisense expression vector can be constructed using 

30 . a non-replicating Sac. erythraea vector that includes flanking DNA from a nonessential 

region of the Sac. erythraea chromosome, such as the region immediately upstream from the 
eryK gene (FIG. I). This vector can then be used to stably insert the antisense construction 
into the chromosome by homologous recombination in a fashion similar to that described for 
the construction of a Type I alteration. 

35 In the third case (referred to herein as Type III alterations), novel glycosylation- 

modified polyketides of desired structure are produced by arranging all or a subset of the 
eryB and/or eryC genes on a replicating vector and introducing these genes en bloc into a 
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"distinct" polyketide-producing organism, ie. one other than the microorganism from which 
the eryB and/or eryC genes were taken.- As an example, eryB and/or eryC genes may be 
taken from Sac. erythreae and introduced into Streptomyces violaceoniger or Streptomyces 
venezuelae. In this case, mycarose, desosamine, their biochemical intermediates and/or their 
5 closely related derivatives will be synthesized and attached at specific positions to polyketide 
compounds that do not necessarily carry these, or any, sugar residues. Some examples of 
novel glycosylated polyketides that may be produced in hosts that carry such manipulations 
are shown in FIG. 6. 

In Type III changes, the genes for the biosynthesis of mycarose and/or desosamine are 
10 introduced into a polyketide-producing organism other than Sac. erythraea by another simple 
two step procedure: 1) all or a subset of the eryB and/or eryC genes are assembled together on 
a replicating plasmid downstream of a Streptomyces promoter; and 2) the plasmid is 
introduced into the polyketide-producing organism. Step 1 requires standard recombinant 
DNA manipulations employing £. coli and/or Streptomyces as the host. Step 2 requires one 
15 or more plasmids out of the several Streptomyces vectors or E. coli-Streptomyces shuttle 
vectors available, one or more promoters that function in Streptomyces, and a selection for 
the presence of the strain carrying the plasmid. As described in Examples 3 and 4, sets of the 
eryB and/or eryC genes are sequentially subcloned together on a replicating vector 
downstream of a suitable promoter that functions in the desired host. The plasmid carrying 
20 the grouped genes is then introduced into the host strain' by electroporation or by 
transformation of protoplasts employing selection for a plasmid marker. 

GENERAL METHODS 

25 Materials,- Plasmids. and Bacterial Strains 

Restriction endonucleases, T4 DNA ligase, competent E. coli DH5a cells, X-gal, 
IPTG and'plasmids pUC18, pUC19, and pBR322 were purchased from Bethesda Research 
Laboratories (BRL), Gaithersburg, MD. VentR® DNA polymerase was purchased from New 

30 England Biolabs (Beverly, MA). Plasmids pGEM®5Zf, pGEM®7Zf, and pGEM®l lZf were 
from Promega, Madison, WI, plasmids pIJ4070 and pIJ702 were obtained from the John 
Innes Institute, Norwich, England, and plasmids pWHM3 and pWHM4 (J. Bacteriol. 1989 
171 :5872) were obtained from C. R. Hutchinson, University of Wisconsin, Madison, WI. 
[ a _32p]dCTP, Hybond™-N nylon membranes, and Megaprime nick translation kits were 

35 from Amersham Corp., Chicago, IL. SeaKem® LE agarose and SeaPlaque® low gelling 
temperature agarose were from FMC Bioproducts, Rockland, ME. E. coli K12 strains 
carrying the E. coli-Sac. erythraea shuttle plasmids pWHM3 and pWHM4 (Vara et aL, J 
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BacterioL 171:5872 (1989)) and pAIX have been deposited at the Agricultural Research 
Culture Collection (NRRL) 1815 N. University Street, Peoria, Illinois 61604, as of 
December 5, 1995, under the terms of the Budapest Treaty and will be maintained for a 
period of thirty (30) years from the date of deposit, or for five (5) years after the last request 
for the deposit, or for the enforceable period of the U.S. patent, whichever is longer. 
Plasmids pWHM3, pWHM4 and pAIX were accorded the accession numbers NRRL B- 
21512, NRRL B-21513 and NRRL B-21514, respectively. Sac. erythraea strain NRRL2338 
is also available from the Agricultural Research Service culture collection. Staphylococcus 
aureus Th R (thiostrepton resistant) was obtained by plating 10 s cells of S, aureus on agar . 
medium containing 10 |lg/ml thiostrepton and picking a survivor after 48 hr growth at 37"C. 
Thiostrepton was obtained from Sigma Chemical, St. Louis, MO. All other chemicals and 
reagents were from standard commercial sources unless otherwise specified. 

DNA Manipulations 

Standard conditions were employed for restriction endonuclease digestion, agarose 
gel-electrophoresis, isolation of DNA fragments from low melting agarose gels, DNA 
ligation, plasmid isolation from E, coli by alkaline lysis, and transformation off. coli 
employing selection for ampicillin resistance (150 |!g/ml) on LB agar plates (Sambrook et aL 
Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Plain view, 
NY, 1989). Total DNA from Sac. erythraea and Streptomyces species (including S.fradiae. 
S. celestes, S. violaceoniger, S. hygroscopicus, S. venezuelae) was prepared according to 
described procedures (Hopwood et aL Genetic Manipulation of Streptomyces, A Laboratory 
Manual, John Innes Foundation, Norwich, UK (1985)). Transfer of DNA from agarose gels 
to Hybond™-N membranes and Southern analysis using Megaprime™ nick translated probes 
was performed according to the manufacturers instructions. 

Amplification of DNA Fragments 

Synthetic deoxyoligonucleotides were synthesized on an ABI Model 380A 
synthesizer (Applied Biosystems, Foster City, CA) following the manufacturers 
recommendations. Amplification of DNA fragments was performed by the polymerase chain 
reaction (PCR) using a Perkin Elmer GeneAmp® PCR System 9600. Reactions contained 
100 pmol of each primer, 1 |Xg of template DNA (chromosomal DNA from Sac. erythraea 
NRRL2338), 2 units VentR® DNA polymerase in 100 jil volume of PCR buffer (10 mM KC1, 
10 mM (NH4)2S04, 20 mM Tris-HCl (pH 8.8, @ 25°C), 2.5 mM MgSC>4, 0.1% Triton® X- 
100) containing dATP (200 jiM), dTTP (200 jiM), dCTP (250 jiM), and dGTP (250 jiM). 
The reaction mixture was subjected to 30 cycles. Each cycle consisted of one period of 35 
sec at 96°C and one period of 2 min at 72°C The reaction products were visualized and 
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purified from low melting agarose. The PCR primers described in the examples were derived 
from the nucleotide sequence of the eryB and eryC genes of FIG. 4. 

Transformation and Gene Replacement in Sac, erythraea 

Protoplasts of Sac. erythraea strains were prepared and transformed with miniprep 
DNA isolated from E. coli according to published procedures (Yamamoto et aL, J 
Antibiotics, 39: 1304 (1986)). Non-integrative transformants, in the case of pWHM4 
derivatives, were selected by regenerating the protoplasts and overlaying with thiostrepton 
(final concentration 20 Jig/ml) as described (Weber et al, Gene, 68:173 (1988)). Integrative 
transformants, in the case of pWHM3 derivatives, were selected on thiostrepton-containing 
agar plates (15 Jig/ml) as described by Weber et al, Gene, 68: 173 (1988). Loss of the Th R 
phenotype was monitored after two rounds of non-selective growth in SGGP media 
(Yamamoto et ai 9 J Antibiotics, 39:1304 (1986)) followed by protoplasting and serial 
dilution on non-selective agar media. Regenerated protoplasts were replica plated on 
thiostrepton-containing media. Th$ (thiostrepton-sensitive) colonies arose at a frequency of 
10" K Retention of the mutant allele was established by Southern hybridization of several 
Th^ colonies. 

Fermentation 

Sac. erythraea or Streptomyces cells are inoculated into' 100 ml SCM medium (1.5% 
soluble starch, 2.0% Difco Soytone, 0.15% Yeast Extract, 0.01% CaCl2) and allowed to grow 
for 3 to 6 days. The entire culture is then inoculated into 10 liters of fresh SCM medium. 
The fermenter is operated for a period of 4 to 7 days at 32°C maintaining constant aeration 
and pH at 7.0. After the fermentation is complete, the cells are removed by centrifugation at 
4°C and the fermentation beer is kept cold until further use. When antibiotic selection to 
maintain a plasmid, such as pXC4 or pXB6, is required, thiostrepton (10(Xg/ml) is added to 
both the 100 ml starter culture and the 10-liter fermenter. 

The invention will be better understood in connection with the following examples, 
which are intended as an illustration of and not a limitation upon the scope of the invention. 
Both below and throughout the specification, it is intended that citations to the literature be 
expressly incorporated by reference. 

Example 1: Construction and characterization of Sac, erythraea ERBIV that produces 

4"-deoxv-4"-oxo-ervthromvcin A 

A. Construction of Plasmid pRBIV : A 4.3 kb Pstl-Hindlll fragment, which included 
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the eryBIV gene, was isolated from the plasmid pAIX5 and subcloned into Pstl-Hindlll 
digested pUC19 to generate plasmid pUCBIV. After transformation and isolation of the 
plasmid from E. coli, the identity of pUCBIV was confirmed by digestion with Muni which 
released a fragment of 370 bp. Plasmid pUCBIV was then cut with the restriction enzyme 
Ncoh the restriction site filled in with Klenow enzyme, and the plasmid religated to generate 
plasmid pNCOBIV, (which now carried a frameshift mutation in the eryBIV gene). After 
transformation and isolation of the plasmid from E. coli, the identity of pNCOBIV was 
confirmed by digestion with Nsil and Hindlll which released a fragment of 1 .59 kb. (The 
Nsil site was formed by the fill-in and religation of the Ncol site.) Finally, plasmid 
pNCOBIV was digested with Hindm and Sstl and the 3.2 kb fragment carrying the altered 
eryBIV gene was isolated and ligated into Hindlll and 5*5/1 digested pWHM3 to generate 
plasmid pRBIV. After transformation and isolation of the plasmid from E. coli, the identity 
of pRBIV was confirmed by digestion with Kpnl which released fragments of 5.2 kb, 4.4 kb, 
and 0.72 kb. 

B. Construction of Sac, ervthraea ERBIV : Sac. erythraea protoplasts were 
transformed with plasmid pRBIV and integrative transformants selected as described in 
General Methods. Resolution of the integrants by nonselective growth as described in 
General Methods yielded Sac. erythraea ERBIV in which the wild type copy of the eryBIV 
gene was replaced with the inactive mutant copy. Gene replacement was confirmed by 
Southern analysis of Ncol digested Sac. erythraea DNA and NcohNsil digested Sac. 
erythraea DNA using the 1.58 kb NcoI-HindUl fragment isolated from plasmid pUCBIV 
(coordinates 68 1-2214, FIG. 4B) as a probe. Wild type Sac. erythraea and wild type 
resolvants display a hybridizing DNA fragment of 2.75 kb when digested with either Ncol or 
Ncol-Nsil, whereas Sac. erythraea strain ERBIV is characterized by hybridization to either a 
16 kb DNA fragment or a 2.75 kb DNA fragment when digested with Ncol or Ncol-Nsil, 
respectively. 

C. Isolation, purification, and properties of 4 ,, -de oxv-4"-oxo-ervthromvcin A from 
Sac, ervthraea ERBIV : Sac. erythraea strain ERBIV is fermented for 4 days in SCM media 
as described in General Methods. The fermentation broth of Sac. erythraea ERBIV is then 
cooled to 4°C and adjusted to pH 4.0 and extracted once with methylene chloride. The 
aqueous layer is readjusted to pH 9.0 and extracted twice with methylene chloride and the 
combined basic methylene chloride extracts are concentrated to a solid residue. This is 
digested in methanol and chromatographed over a column of Sephadex LH-20 in methanol. 
Fractions are tested for bioactivity against a sensitive organism, such as Staphylococcus 
aureus Th R , and active fractions are combined. The combined fractions are concentrated anc 
the residue is digested in 10 ml of the upper phase of a solvent system consisting of n- 
heptane, benzene, acetone, isopropanol. 0.05 M, pH 7.0 aqueous phosphate buffer 
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(5:10:3:2:5, v/v/v/v/v), and chromatographed on an Ito Coil Planet Centrifuge in the same 
system. Active fractions are combined, concentrated and partitioned between methylene 
chloride and dilute ammonium hydroxide (pH 9.0). The methylene chloride layer is 
separated and concentrated to yield the desired product as a white foam. 

5 

Example 2: Construction and characterization of Sac, ervthraea ER 720(pASBVTD that 
produces 3-a-D-mvcarosvl-5-8-D-desosaminov]- 1 2-hvdroxv-ervt hronolide B 

A. Construction of plasmid pASX2 (see FIG. 7) : The 290 bp EcoRl-BamHl segment 

io carrying the ermE* promoter is isolated from plasmid pIJ4070 and ligated into EcoRl-BamHl 
digested pWHM4 DNA to form pASXl. After transformation and isolation of the plasmid 
from E. coll the identity of pASXl is confirmed by digestion with ApdLl which releases 
fragments of 3.9 kb, 2.5 kb, 1.2 kb, 0.5 kb, and 0.4 kb. Two oligonucleotides of the 
sequences: SEQEDNO:31 (S'-GATCCAGCGTCTGCAGGCATGCTCTAGATACAATTA 

15 AAGGCTCCTTTTGGAGCCITTI IT 1 riGGAGATTTTCAACGT-3') and 

SEQ ID NO:32 (5'-AGCTACGTTGAAAATCTCCAAAAAAAAAGGCTCCAAAA 
GGAGCCTTTAATTGTATCTAGAGCATGCCTGCAGACGCTG-3'), corresponding to the 
(+) and (-) strands of the bacteriophage fd gene VIII transcription terminator (t-fd) (Beck et 
al: (1978) NucL Acids. Res. 5:4495])and including restriction enzyme sites for the enzymes 

20 Pstl y Sphh and Xbal, and overhanging ends compatible with BamHl and Hindlll are 

synthesized and approximately 250 ng of each oligonucleotide are then mixed together in TE 
buffer and heated to 99 d C for 1 min. The solution is cooled slowly to room temperature 
allowing the oligonucleotides to anneal due to self complementarity, and the annealed 
oligonucleotides are then ligated into BamHl-Hindlll digested pASXl to give pASX2. After 

25 transformation and isolation of the plasmid from E. coli, \he identity of pASX2 is confirmed 
by DNA sequencing of the 1.2 kb EcoRl-Sall fragment that contains the ErmE"" promoter and 
the bacteriophage fd terminator. 

R' Construction of olasmid pASBVII fsee FIG. S) : The 598 base pair DNA segment 
that carries the eryBVII gene, comprising coordinates 7398-7996 (FIG. 4B), is amplified by 

30 PCR employing two oligonucleotides, SEQ ID NO:33 (5*- 

GATCGCATGCTCTAGAGTACG-TGAGCTGGCGGTGGCGGGC-3 , ) and SEQ ID NO: 34 
(5'-GATCCGGATCCGCATGCTT-CACCTGCCGGTGCTGGCGGG-3'). After digestion of 
the purified PCR product with BamHl-Xbal the PCR fragment was ligated to BamHl-Xbal 
digested pASX2 to give pASBVII. After transformation and isolation of the plasmid from E. 

35 coll the identity of pASBVII is verified by DNA sequencing of the 880 bp EcoRl-Xbal 
insert. 

"C. Construction of Sac, ervthraea EP79(VpASBVIIV Sac. erythraea strain ER720 
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protoplasts are transformed with plasmid pASBVII and transformants are selected for with 
thiostrepton (15 (J-g/ml). To confirm transformation, total DNA is isolated from Th R colonies 
and used to transform E. coli. After transformation and isolation of the plasmid from E. coli, 
the identity of pASB VII is verified by restriction analysis with the enzymes PvuII and BamHl 
which releases a 1 .48 kb fragment. Those Sac. erythraea colonies that are found to contain 
pASB VII are designated Sac. erythraea ER720(pASB VII). 

D. Isolation, purification, and properties of 3-a-D-mvcarosvl-5-6-D -desosaminovl- 
12-hvdroxv-ervthronolide B from Sac, erythraea ER720<pASBVID : Sac. erythraea 
ER720(pASBVII) is fermented for 3 days in SCM media with thiostrepton selection as 
described in General Methods. The fermentation broth is then cooled to 4°C and adjusted to 
pH 4.0 and extracted once with methylene chloride. The aqueous layer is readjusted to pH 
9.0 and extracted twice with methylene chloride and the combined extracts are concentrated 
to a solid residue. This is digested in methanol and chromatographed over a column of 
Sephadex LH-20 in methanol. Fractions are tested for bioactivity against a sensitive 
organism, such as Staphylococcus aureus Th R , and active fractions are combined. The 
combined fractions are concentrated and the residue is digested in 10 ml of the upper phase of 
, a solvent system consisting of n-heptane, benzene, acetone, isopropanol, 0.05 M, pH 7.0 
aqueous phosphate buffer (5:10:3:2:5, v/v/v/v/v), and chromatographed on an Ito Coil Planet 
Centrifuge in the same system. Active fractions are combined, concentrated and partitioned 
between methylene chloride and dilute ammonium hydroxide (pH 9.0). The methylene 
chloride layer is separated and concentrated to yield the desired product as a white foam. 

Example 3: Construction and characterization of Strepto mvces antihioticus ATCC 
1 189KPXB6) that produces 3-des-oleandrosvl-3-mvcar osvl oleandomycin 

A. Construction of plasmid pKB6 and intermediates (s ee FIG. 9) 

i) Construction of plasmid pK 1 : The DNA sequences of pBR322 (GenBank 
Accession #:*J01749) and pUC19 (GenBank Accession #: X02514) are known. The 805 nt 
DNA segment comprising coordinates 1673 through 2478 of pBR322 is amplified by PCR 
employing two oligodeoxynucleotides, SEQ ID NO:35 (5'-GATCACATGTTCTTTCCTG- 
CGTTATCCCCTG-3') and SEQ ID NO:36 (S'-GATCGGATCCATGCATGTCTAGAGCA- 
TCGCAGGATGCTGCTGGC-3'). After digestion of the purified PCR product with AfOll 
and BamHl the fragment is ligated into AfllU and BamHl digested pUC19 to give plasmid 
pKl. The identity of plasmid pKl, after transformation and isolation from £. coli, is verified 
by PvwII digestion which releases fragments of 0.55 kb and 2.55 kb. Plasmid pKl contains 
the ROP region of pBR322 that controls plasmid copy number. 

ii) Construction of plasmid pKBl : The 2.24 kb DNA segment that carries the 
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eryBIV and eryBV genes, comprised between coordinates 56 and 2296 of the sequence 
presented in SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, 
SEQ ED NO:37 (5'-GAATGCATCCTGGAAAGCGAGCAAATGCTCCGGTG-3') and SEQ 
ID NO:38 (5'-GATCTAGAGCTAGCCGGCGTGGCGGCGCGTG-3')- After digestion with 
Nsi\ and Xbal the fragment is ligated into Nsil and Xbal digested pKl to yield plasmid pKB 1 , 
5.3 kb in size. The identity of plasmid pKB 1 , after transformation and isolation from E. coll 
is verified by Kpnl digestion which releases fragments of 0.72 kb, 1 . 14 kb and 3.42 kb. 

iii) Construction of plasmid pKB2 : The 1 .56 kb DNA segment that carries 
the eryBVI gene, comprised between coordinates 3121 and 4677 of the sequence presented in 
SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, SEQ ID NO:39 
(5'-GATCGCTAGCCGTGACCGGACCCTTACAGTGAGTG-3') and SEQ ID NO:40 
(5'-GATCTAGACTTAAGTCATCCGGCGGTCCTGGTGTAGACGGC-3'). After digestion 
with Nhel and Xbal the fragment is ligated into Nhel and Xbal digested pKB 1 to give plasmid 
pKB2, 6.9 kb in size. The identity of plasmid pKB2, after transformation and isolation from 
E. coll is confirmed by BamYll digestion which releases fragments of 0.22 kb, 0.40 kb, 2.6 
kb and 3.7 kb. 

iv) Construction of plasmid dKB3 : The 0.6 kb DNA segment that carries the 
ery BVII gene, comprised between coordinates 7385 and 7987 of the sequence presented in 
SEQ ED NO:2, is amplified by PCR employing two deoxyoligonucleotides, SEQ ED NO:41 
(5'-GATCTTAAGAACCGGAGTTGCGAGTACGTGAGCTGGCG-3') and SEQ ED NO:42 
(5'-GATCTAGACCTAGGTCACCTGCCGGTGCTGGCGGGCTC-3'). After digestion with 
A/Ill and Xbal the fragment is ligated into AfRl zndXbal digested pKB2 giving plasmid 
pKB3, 7.5 kb in size. The identity of plasmid pKB3, after transformation and isolation from 
E. coli, is verified by Pstl digestion which releases fragments of 1.1 kb and 6.4 kb. 

v) Construction of plasmid dKB4 : The 1.0 kb DNA segment that carries the 
eryBII gene, comprised between coordinates 2385 and 3410 of the sequence presented in 
SEQ ED NO: 1 , is amplified by PCR employing two deoxyoligonucleotides, SEQ ED NO:43 
(5"-GATCCTAGGCCGCAGGAAGGAGAGAACCACG-3') and SEQ ED NO:44 
(5'-GATCTAGATTAATCACTGCAACCAGGCTTCCGGC-3'). Following digestion with 
Avrll and Xbal the fragment is ligated into Avrll and Xbal digested pKB3 yielding the desired 
plasmid pKB4. After transformation and isolation of the plasmid from E. coll the identity of 
pKB4, 8.5 kb in size, is verified by Bglll and EcoRI digestion which releases fragments of 
0.4 1 kb, 1 .6 kb, 3. 1 kb and 3.4 kb. 

vi) Construction of plasmid pKB5 : The DNA sequence of eryBIII has been 
reported (Haydock et al ( 1 99 1 ) Mol Gen Genet 230: 1 20). The 1 .3 kb DNA segment that 
carries the eryBIII gene, comprised between coordinates 3965 and 5232 of the sequence 
depicted in Haydock et al, is amplified by PCR employing two deoxyoligonucleotides, SEQ 
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ID NO:45 ( 5 '-G ATT A ATTGGCCGCGGCGCCGCGCTC-GTT ATG-3 ') and SEQ ED NO:46 
(5'-GATCTAGATAATTAATCATACGACTTCCAGTC-GGGGTAG-3'). After digestion 
with Msel and Xbal the fragment is ligated into Msel and Xbal digested pKB4 to give the 
desired plasmid pKB5, 9.8 kb in size. The identity of pKB5, after transformation and 
isolation from E. coli, is verified by Pstl digestion which releases fragments of 1. 1 kb, 2.5 kb. 
and 6.1 kb, visualized by gel electrophoresis. 

vii) Construction of plasmid pKB6 : The eryBI gene has been mapped 
(Haydock et al (1991) Mol Gen Genet 230: 120) and the DNA sequence on both flanks of 
eryBI is known (Haydock et al (1991) Mol Gen Genet 230:120) and GenBank Accession # 
Ml 1200. The 2.5 kb DNA segment that carries the eryBI gene, comprised between 
coordinates 1 . 1 and 3.6 of the map presented in Haydock et al., is amplified by PCR 
employing two deoxyoligonucleotides: SEQ ID NO:47 (5'-GATTAATTAATGATCA- 
AGCTGAAAATTGTTTGCATG-3') and SEQ ID NO:48 (S'-GATCTAGACTGCCGGCT- 
CAGCCTTCCCAGGTTCG-3'). After digestion with Pad and Xbal the fragment is ligated 
into Pad and Xbal digested pKB5 to give plasmid pKB6, 12.3 kb in size. The identity of 
pKB6, after transformation and isolation from E. coli, is verified by 5amHI digestion which 
releases fragments of 0.22 kb, 0.40 kb, 1.4 kb, 2.6 kb, 3.3 kb and 4.4 kb. Plasmid pKB6 
carries all of the eryB genes, eryBI-eryBVII, that are involved in the biosynthesis of mycarose 
and its attachment to the polyketide. 

B. Construction of Plasmid pXSB6 (see FIG. 11) : The 9.2 kb Nsil-Xbal segment of 
P KB6, prepared as described in Example 3(A)(vii) above, that carries all of the eryB genes is 
isolated and ligated into Pstl-Xbal digested pASX2, prepared as described in Example 2(A) 
above, to give plasmid pXSB6. After transformation and isolation of the plasmid from E. 
coli, the identity of pXSB6, 17.2 kb in size, is verified by the observation of fragments of 
0.41 kb, 1.9 kb, and 14.9 kb after EcoRl digestion. Plasmid pXSB6 carries all of the eryB 
genes in a transcriptional fusion downstream of the ermE* promoter on an E. coli- 
Streptomyces shuttle plasmid. 

C. Construction of Plasmid pXB6 

i) Construction of plasmid D N70?. FIG. 10) : Two oligonucleotides of the 
sequences: SEQ ID NO:49 5'-GGAATTCAGATCTATGCATTCTAGAA-3') and 
SEQ ID NO: 50 (5'-CGCGTTCTAGAATGCATAGATCTGAATTCCTGCA-3') that include 
restriction enzyme sites for the enzymes EcoKL, B g m,Nsil and Xbal and overhanging ends 
compatible with Pstl and Mlul are synthesized. Approximately 250 ng of each 
•oligonucleotide are then mixed together in TE buffer and heated to 99°C for 1 nun. After the 
solution is cooled slowly to room temperature allowing the oligonucleotides to anneal due to 
self complementarity, the annealed oligonucleotides are ligated into Pstl-Mlul digested 
pIJ702 to yield the desired plasmid pN702. After transformation and isolation of the plasmid 
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from Streptomyces lividans 1326, the identity of plasmid pN702, 4,3 kb in size, is verified by 
the observation of fragments of 0.75 kb and 3.6 kb after EcoRl-BamHl or Xbal-BamHI 
digestion. 

ii) Construction of plasmid pXI fsee FIG. 101 : The 290 bp EcoRl-BamHl 
segment that carries the ermE* promoter is isolated from plasmid pIJ4070 and ligated into 
EcoRl-BgRl digested pN702 to give plasmid pXI. The resulting mixture contains the desired 
plasmid pXI . After transformation and isolation of the plasmid from Streptomyces lividans 
1326, the identity of plasmid pXI, 4.6 kb in size, is verified by the observation of fragments 
of 1 .0 kb and 3.6 kb after Nsil-BamHl digestion. 

iii) Construction of plasmid pXB6 (see FIG. 1 1) : The 9.2 kb Nsil-Xbal 
segment of pKB6, prepared as described in Example 3(A)(vii) above, that carries all of the 
eryB genes is isolated and ligated into Nsil-Xbal digested pXI to give the desired plasmid 
pXB6. After transformation and isolation of the plasmid from Streptomyces lividans 1326, 
the identity of plasmid pXB6, 13.8 kb in size, is verified by the observation of fragments of 
0.41 kb, 1.9 kb, and 1 1.5 kb after EcoRl digestion. Plasmid pXB6 carries all of the eryB 
genes in a transcriptional fusion to the ermE* promoter on a Streptomyces plasmid. 

D. Construction of Streptomyces antibioticus ATCC 1 1891(pXB6) : Approximately 
500 jig of plasmid pXB6, isolated from Streptomyces lividans 1326(pXB6), are 
electroporated into the oleandomycin producer Streptomyces antibioticus ATCC 1 1891 and 
several of the resulting Thio^ colonies that appear on the R3M-agar plates containing 
thiostrepton are analyzed for their plasmid content. The presence of plasmid pXB6, 13.8 kb 
in size, is verified by the observation of fragments of 0.41 kb, 1.9 kb, and 1 1.5 kb after EcoRl 
digestion. 

E. Isolation, purification, and properties of 3-des-oleandrosvl-3-mvcarosvl 
oleandomycin from Streptomyces antibioticus ATCC 1 189KpXB6) : Streptomyces 
antibioticus ATCC 1 1891(pXB6) is fermented for 5 days in SCM media with thiostrepton 
selection as described in General Methods. The fermentation broth is then cooled to 4°C and 
adjusted to pH 4.0 and extracted once with methylene chloride. The aqueous layer is 
readjusted to pH 9.0 and extracted twice with methylene chloride and the combined extracts 
are concentrated to a solid residue. This is digested in methanol and chromatographed over a 
column of Sephadex LH-20 in methanol. Fractions are tested for bioactivity against a 
sensitive organism, such as Staphylococcus aureus Th^, and active fractions are combined. 
The combined fractions are concentrated and the residue is digested in 10 ml of the upper 
phase of a solvent system consisting of n-heptane, benzene, acetone, isopropanol, 0.05 M, pH 
7.0 aqueous phosphate buffer (5:10:3:2:5, v/v/v/v/v), and chromatographed on an Ito Coil 
Planet Centrifuge in the same system. Closely eluting active fractions are combined, 
concentrated and partitioned between methylene chloride and dilute ammonium hydroxide 
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(pH 9.0). The methylene chloride layer is separated and concentrated to yield the desired 
product as a white foam. 

Exam ple 4: Construction and characterization of S t reptnmvces violaceoniper NRRL 
2834<pXC4~> that produces 5-des-chalcosvl-5- desosaminovl lankamycin 

A. Construction of plasmid pKC4 and intermediates (see FIG- 12) 

i) Construction of nlasmid oKCl : The 2.4 kb DNA segment that carries the 
eryCII and eryCIII genes,. comprised between coordinates 33 and 2413 of the sequence 
presented in SEQ ID NO: 1 , is amplified by PCR employing two deoxyoligonucleotides, 
SEQ ID NO:51 (5'-GAATGCATCTGGCTGGGCGGAGGGAATTCATG-3') and 

SEQ ID NO:52 (5'-GATCTAGACTTAAGTCATCGTGGTTCTCTCCTTCCTGC 
GGC-3'). After digestion with Nsil and Xbal the purified PCR fragment is ligated into Nsil 
and Xbal digested pKl to give plasmid pKCl, 5.5 kb in size. The identity of plasmid pKCl. 
after transformation and isolation from E. coli, is verified by EcoRl digestion which releases 

fragments of 2.2 kb and 3.3 kb. 

ii) Construction of plasmid dKC2 : The 732 bp DNA segment that carries the 
eryCVI gene, comprised between coordinates 2331 and 3063 of the sequence presented in 
SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, 

SEQ ID NO:53 (S'-GATCCTTAAGCTCCGGAGGGAGCAGGGATG-S') and 
SEQ ID NO:54 (5'-GATCTAGACCTAGGTCATCCGCGCACACCGACGAAC-3 ). After 
digestion with A/7TI and Xbal the purified PCR fragment is ligated into AflO. and Xbal 
digested pKCl to give plasmid pKC2, 6.2 kb in size. The identity of plasmid pKC2, after 
transformation and isolation from E. coli, is verified by Xbal-EcoRl digestion which releases 

fragments of 0.95 kb, 2.2 kb and 3. 1 kb. 

iii) Construction of plasmid nKC3 : The 2.7 kb DNA segment that carries the 
eryCIV and eryCV genes, comprised between coordinates 4650 and 7386 of the sequence 
presented in SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, 
SEQ ID NO:55 (5 -GATCCTAGGCCGTCTACACCAGGACCGCCGG-3 ) and 

SEQ ID NO:56 (5 -GATCTAGATTAATCACCTTCCGCGCAGGAAGCCGC-3 ). After 
digestion with Avrll and Xbal the purified PCR fragment is ligated into Avrll and Xbal 
digested pKC2 to yield plasmid pKC3, 9.0 kb in size. The identity of plasmid pKC3, after 
transformation and isolation from E. coli. is verified by Sphl digestion which releases 

fragments of 4.0 kb and 5.0 kb. 

iv) Construction of plasmid dKC4 : The DNA sequence of the eryCI gene has 
been determined (GenBank Accession #X 15541). The 1.1 kb DNA segment that carries the 
eryCI gene, comprised between coordinates 38 and 1 161 of the sequence indicated above, is 
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amplified by PCR employing two deoxyoligonucleotides, SEQ ID NO:57 (5-GATCTTAAG- 
CCGCCACTCGAACGGACACTCG-3') and SEQ ID NO:58 (5'-GATCTAGATCAAGCCC- 
CAGCCTTGAGGG-3')- After digestion with Msel and Xbal the fragment is ligated into 
Msel and Xbal digested pKC3 to give plasmid pKC4 ? 10.1 kb in size. The identity of plasmid 
pKC4, after transformation and isolation from E, coll is verified by Kpnl digestion which 
releases fragments of 0. 1 5 kb, 0.3 1 kb, 4. 1 kb and 5.5 kb. Plasmid pKC4 carries all of the 
eryC genes, eryCl-eryCVL that are involved in the biosynthesis of desosamine and its 
attachment to the polydetide. 

B. Construction of Plasmid pXSC4 (see FIG- 13) : The 6.9 kb Nsil-Xbal segment of 
pKC4 that carries all of the eryC genes is isolated and ligated into Pstl-Xbal digested pASX2, 
prepared as described in Example 2(A), to give the desired plasmid pXSC4, 14.9 kb in size, 
wherein all of the eryC genes are transcriptionally linked downstream of the erm£* promoter 
on an E. coli-Streptomyces shuttle plasmid. The identity of plasmid pXSC4, after 
transformation and isolation from E. coli, is verified by the observation of fragments of 0.29 
kb, 2.2 kb, and 1 2.4 kb after EcoRl digestion . 

C. Construction of Plasmid pXC4 (see FIG. 13) : The 6.9 kb Nsil-Xbal segment of 
pKC4 that carries all of the eryC genes is isolated and ligated into Nsil-Xbal digested pXl , 
prepared as described in Example 3(C)(ii), to give the desired plasmid pXC4, 1 1 .5 kb in size, 
wherein all of the eryC genes are transcriptionally linked downstream of the errnE* promoter 
on a Streptomyces plasmid. After transformation and isolation of the plasmid from 
Streptomyces lividans 1326, the identity of plasmid pXC4 is verified by the observation of 
fragments of 0.29 kb, 2.2 kb, and 9.0 kb after EcoRl digestion. 

D. Construction of Streptomyces violaceoniver NR RL 2834(dXC4): Approximately 
500 (ig of the plasmid pXC4, isolated from Streptomyces lividans 1326(pXC4) t are 
electroporated into the lankamycin producer Streptomyces violaceoniger NRRL 2834 and 
several of the resulting Thio R colonies that appear on the R3M-agar plates containing 
thiostrepton are analyzed for their plasmid content. The presence of plasmid pXC4 is verified 
by the observation of fragments of 0.29 kb, 2.2 kb, and 9. 1 kb in size after EcoRl digestion 
of the plasmid. 

E. Isolation, purification, and properties of S-des-chalcosvl-S-desosaminoy l, 
lankamvcin : S. violaceoniger NRRL 2834(pXC4) is fermented for 5 days in SCM media 
with thiostrepton selection as described in General Methods. The fermentation broth is then 
cooled to 4°C and adjusted to pH 4.0 and extracted once with methylene chloride. The 
aqueous layer is readjusted to pH 9.0 and extracted twice with methylene chloride and the 
combined extracts are concentrated to a solid residue. This is digested in methanol and 
chromatographed over a column of Sephadex LH-20 in methanol. Fractions are tested for 
bioactivity against a sensitive organism, such as Staphylococcus aureus Th R , and active 
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fractions are combined. The combined fractions are concentrated and the residue is digested 
in 10 ml of the upper phase of a solvent system consisting of n-heptane, benzene, acetone, 
isopropanol, 0.05 M, pH 7.0 aqueous phosphate buffer (5:10:3:2:5, v/v/v/v/v), and 
chromatographed on an Ito Coil Planet Centrifuge in the same system. Active fractions are 
combined, concentrated and partitioned between methylene chloride and dilute ammonium 
hydroxide (pH 9.0). The methylene chloride layer is separated and concentrated to yield the 
desired product as a white foam. 

Although the present invention is illustrated in the examples listed above in terms of 
preferred embodiments, these examples are not to be regarded as limiting the scope of the 
invention. The above illustrations serve to describe the principles and methodologies 
involved in creating the types of genetic alterations that can be introduced into Sac. erythraea 
and/or other Streptomyces that result in the synthesis of novel glycosylation-modified 
polyketide products. Although a single Type I alteration, leading to the production of for 
example, 4 M -deoxy-4"-oxo-erythromycin A, is specified herein, it is obvious to those skilled 
in the art that other Type I changes can be introduced into the eryB and/or eryC genes leading 
to novel glycosylation-modified polyketide structures. Examples of additional Type I 
alterations leading to useful novel compounds include but are not limited to: mutations in the 
eryBVII gene conceivably leading to 3-a-D-mycarosyi-5-B-D-desosaminoyl- 12-hydroxy- 
erythronolide B and mutations in the eryCVI gene conceivably leading to N-3a'-des-dimethyl 
erythromycin A. Moreover, it is obvious that Type I alterations in two or more different eryB 
and/or eryC genes can be combined leading to novel glycosylation-modified polyketide 
structures. Examples of combinations of two Type I alterations leading to useful compounds 
include but are not limited to: mutations in the eryB IV and eryBVII genes conceivably leading 
to 3-a-D-4 M -deoxy-4"-oxo-mycarosyl-5-B-D-desosaminoyl-12-hydroxy-erythronolide B; 
mutations in the eryBIV and eryCVI genes conceivably leading to 4 ,, -deoxy-4 M -oxo-(N-3a'- 
des-dimethyl)-erythromycin A; and mutations in the eryBIV, eryBVII , and eryCVI genes 
conceivably leading to 3-a-D-4"-deoxy-4 , '-oxo-mycarosyl-5-B-D-(N-3a , -des-dimethyl)- 
desosaminoyl-12-hydroxy-erythronolide B. All Type I mutations or combinations of two or 
more Type I mutations in the eryBIl eryBIV, eryBV, eryBVl eryBVII eryCIl eryCIIl 
eryCP/ t eryCV, or eryCVI genes, the Sac. erythraea strains that carry said mutations or 
combinations of mutations, and the corresponding polyketides produced from said strains, 
therefore, are included within the scope of the present invention. 

Although the Type II mutation specified herein was constructed with the eryBVII gene 
on a self-replicating plasmid it is obvious that other eryB genes and eryC genes can be 
expressed in an antisense orientation leading to novel glycosylation-modified polyketide 
structures. Examples of additional Type II alterations leading to useful compounds include 
but are not limited to: antisense expression of the eryBIV gene conceivably leading to 4"- 
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deoxy-4"-oxo-erythromycin A and antisense expression of the eryCVl gene conceivably 
leading to N-3a'-des-dimethyl erythromycin A. Moreover, it will occur to those skilled in the 
art that promoters other than the ermE* promoter, for example the melC promoter of pIJ702, 
will be suitable for antisense expression, and that many self-replicating vectors in addition to 
pWHM4 will function to carry the antisense alteration. It will also occur to those skilled in 
the art that a self-replicating vector is not required for this invention and that the antisense 
alteration can be introduced directly into the chromosome using the same principles 
employed to construct a Type I gene alteration. An example of a Type II alteration that is 
introduced directly into the chromosome is the eryBVH antisense alteration described in 
Example 2 wherein DNA segments immediately upstream of the eryK gene are used to flank 
the ermE-eryBVII-phagt fd terminator grouping in a pWHM3 vector, and this vector is 
integrated into and then resolved from the chromosome leaving the £rm£*-ery#V7/-phage fd 
terminator grouping stably incorporated into this nonessential region of the chromosome of 
Sac. erythraea conceivably leading to the production of 3-a-D-mycarosyl-5-B-D- 
desosaminoyl-12-hydroxy-erythronolide B. All Type II mutations in the eryBII, eryBF/, 
eryBV, eryBVI, eryBVH eryCH eryCIH eryCTV, eryCV, or eryCVl genes whether carried on 
a self-replicating plasmid or integrated into a nonessential region of the chromosome, the Sac. 
erythraea strains that carry said mutations, and the corresponding polyketides produced from 
said strains, therefore, are included within the scope of the present invention. 

Although Type III alterations, leading to the production of 5-des-chalcosyI-5- 
desosaminoyl lankamycin in Streptomyces violaceoniger and 3-des-oleandrosyl-3-mycarosyl 
oleandomycin in Streptomyces antibioticus, are specified herein, it is obvious that Type III 
alterations can be introduced into any polyketide producing microorganism leading to novel 
glycosylation modified polyketides. It will also occur to those skilled in the art that both the 
eryB and eryC genes can either be cotransformed into a polyketide producing microorganism 
or grouped together on a single vector that is introduced into a polyketide producing 
microorganism. An example of a Type III change using both the eryB and eryC genes 
together is their introduction into Streptomyces violaceoniger conceivably leading to 3-des- 
(4"-0-acetylarcanosyl)-3-mycarosyl-5-des-chalcosyl-5-desosaminoyl lankamycin. Although 
the Type III alterations specified herein have indicated a specific genetic order of the eryB or 
eryC genes, it will occur to those skilled at the art that many different genetic arrangements of 
the eryB oreryC genes will produce similar results. It will also that occur to those skilled at 
the art that certain arrangements of the eryB and/or eryC genes that lack one or more of the 
respective eryB and/or eryC genes will lead to the production of novel glycosylated 
polyketides in which intermediate compounds in the biosynthesis of mycarose and/or 
desosamine, respectively, such as those outlined in FIGS. 2 and -3, are attached to the 
polyketide. An example of a Type III alteration in which only a subset of the eryB and/or 
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eryC genes are used is the introduction of a pXC4 derivative that lacks the eryCVI gene, 
removed by digestion of plasmid pXC4 with Aflll and Avrll followed by treatment with the 
Klenow fragment of DNA polymerase I and religation, into Streptomyces violaceoniger 
leading to the production of to 5-des-chalcosyl-5-(N-3a , -des-dimethyl desosaminoyl) 
lankamycin. It will also that occur to those skilled at the art that promoters other than 
ermE or ermE*, such as the melC promoter of plasmid pIJ702, and vectors other than 
pWHM4 or pIJ702 can also be utilized in the construction of a Type III alteration, and these 
variants are, of course, considered to be within the scope of the invention. Finally, it will also 
occur to those skilled in the art that a self-replicating vector is not required for this invention 
and that an assembly of sugar biosynthesis genes can be introduced directly into the 
chromosome of a heterologous host using the same principles employed to construct a Type I 
gene alteration once a nonessential region of the heterologous host chromosome has been 
identified. Alternatively, plasmids or bacteriophages which undergo site-specific 
recombination with host genes may also be used to introduce eryB and eryC genes into a host 
to effect Type III alterations. All Type III alterations using one or more of the eryBII, 
eryBIV, eryBV, eryBVI, eryBVII, eryCII, eryCUl, eryCIV, eryCV, or eryCVI genes, the 
polyketide producing strains that carry said alterations, and the corresponding polyketides 
produced from said strains, therefore, are included within the scope of the present invention. 

In addition, it is also possible to create combinations of Type I and Type II alterations 
such that some Type I eryB and/or eryC mutations are introduced directly into the Sac. 
erythraea chromosome in the appropriate locus, while other eryB and/or eryC genes are 
inactivated by Type II alterations using a self-replicating or integrating vector. For example, 
combination of a Type I alteration, such as a mutation in eryBIV, and a Type II alteration, 
such as transformation with pASBVII, will conceivably lead to production of 3-a-D-4"- 
deoxy-4"-oxo-mycarosyl-5-B-D-desosaminoyl-12-hydroxy-erythronolide B. All 
combinations of two or more alterations of Type I and Type II, the Sac. erythraea strains that 
carry such alterations, and the glycosylated polyketides produced from such strains are 
included within the scope of the present invention. 

As an extension of the examples reported with the eryB and/or eryC genes, it is 
possible to apply the method described herein to heterologous sugar biosynthesis genes that 
are similar to the eryB and/or eryC genes. The construction of strains carrying heterologous 
sugar biosynthesis genes that lead to the production of novel glycosylated polyketides 
requires: (i) cloning of the sugar biosynthesis genes from any other glycosylated-polyketide 
producing actinomycete, (ii) determining the nucleotide sequence of the cloned gene(s); (iii) 
excising and assembling the cloned gene(s) into vectors suitable for Type I, Type II, or Type 
III alterations; and (iv) transformation of polyketide producing microorganisms and screening 
for the novel compound. Any polyketide-associated sugar biosynthesis gene can thus be 
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precisely excised from the genome of a glycosylated polyketide producing microorganism 
and altered or arranged with other sugar biosynthesis genes and then introduced into the same 
or another polyketide producing microorganism to create a novel glycosylated polyketide of 
predicted structure. Thus, for example, a Type I or Type II alteration of a heterologous gene 
that is similar to an eryB and/or eryC gene, such as can be found in the eryB W/homolog for 
the synthesis of L-oleandrose in Streptomyces antibioticus, to result in the production of 3- 
des-L-oleandrosyl-3-D-oleandrosyl oleandomycin is included within the scope of the present 
invention. Similarly, a Type III assembly of the genes for the synthesis of a sugar other than 
mycarose or desosamine, such as can be found in the genes for the synthesis of angolosamine 
in Streptomyces eurythermus, and their transformation into Sac. erythraea to result in the 
synthesis of 5-des-desosaminoyl-5-angolosaminoyl-erythromycin A is included within the 
scope of the present invention. 

It will occur to those skilled in the art that the Type I, Type II, and Type III genetic 
manipulations described herein and the polyketide producing microorganisms into which they 
are introduced are in no way exclusive. Hence, the choice of a convenient host and the 
choice of a Type I, Type II, or Type III alteration is based solely on the relatedness of the 
desired novel glycosylated polyketide to a natural counterpart. Therefore, Type I, Type II, 
and Type III alterations can be constructed in any polyketide producing microorganism 
employing either endogenous or exogenous sugar biosynthesis genes. Thus all Type I, Type 
II, and Type III mutations or various combinations thereof constructed in any polyketide 
producing microorganism according to the principles described herein, and the respective 
polyketides produced from such strains, are included within the scope of the present 
invention. Examples of glycosylated polyketides that can be altered by creating Type I, Type 
II, or Type III changes in the producing microorganisms include, but are not limited to 
macrolide antibiotics such as erythromycin, tylosin, spiramycin, etc; aromatic polyketides 
such as daunorubicin and doxorubicin, etc; polyenes such as candicidin, amphotericins, etc; 
and other complex polyketides such as avermectin. 

Whereas the novel derivatives or modifications of erythromycin described herein have 
been specified as the A derivatives, such as 4 M -deoxy-4 M -oxo-erythromycin A, those skilled in 
the art understand that the wild type strain of Sac. erythraea produces a family of 
erythromycin compounds, including erythromycin A, erythromycin B, erythromycin C, and 
erythromycin D. Thus, modified strains of Sac. erythraea, such as strain ERBIV, for 
example, would be expected to produce the corresponding members of the 4"-deoxy-4"-oxo- 
erythromycin family, including 4"-deoxy-4"-oxo-erythromycin A, 4"-deoxy-4"-oxo- 
erythromycin B, 4"-deoxy-4"-oxo-erythromycin C, and 4"-deoxy-4"-oxo-erythromycin D. 
Similarly, all other modified strains of Sac. erythraea that produce novel glycosylated 
erythromycin derivatives would be expected to produce the A, B, C, and D forms of said 
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derivatives. For example, modified Sac. erythraea strains that produce 6-deoxyerythromycin, 
6,12-dideoxyerythromycin and 6,7-anhydroerythromycin would be expected to produce novel 
glycosylation-modified polyketides by introduction of the additional modification of a Type 
I, II or III change in a sugar biosynthesis gene. Therefore, all members of the family of each 
of the novel erythromycins described herein or produced by these methods are included 
within the scope of the present invention. 

Variations and modifications of the methods for obtaining the desired plasmids, hosts 
for cloning and choices of vectors and eryB and/or eryC genes to clone and modify, other 
than those described herein will occur to those skilled in the art. For example, although we 
have described the use of plasmids pWHM3, pWHM4, and pIJ702, other vectors can be 
employed wherein all or part of said plasmids is replaced by other DNA segments that 
function in a similar manner, such as replacing the pUC19 component of pWHM3 and 
pWHM4 with pBR322, available from BRL; or employing different segments of the pUlOi 
replicon in pWHM3 and pIJ702, or the pJVl replicon in pWHM4, respectively; or employing 
selectable markers other than thiostrepton- or ampicillin-resistance. These are just a few of a 
long list of possible examples all of which are included within the scope of the present 
invention. Similarly, the segments of the eryB and eryC loci that have been specified herein 
to generate the various Type I, Type II, and Type III alterations can readily be substituted for 
other segments of different length encoding the same functions, either produced by PCR- 
amplification of genomic DNA or of an isolated clone, or by isolating suitable restriction 
fragments from Sac. erythraea. In the same way it is possible to create Type I mutations 
functionally equivalent to those described herein by altering through deletion, insertion, or 
site directed mutagenesis different portions of the corresponding genes. It is also possible to 
create Type II mutations functionally equivalent to those described herein by employing 
larger or smaller portions of the corresponding genes; and it is possible to create Type III 
mutations using larger or smaller segments of the corresponding genes in the same or 
different linear order described herein. Additional modifications include changes in the 
restriction sites used for cloning or in the general methodologies described above. All such 
changes are included in the scope of the present invention. It will also occur to those skilled 
in the an that different methods are available to ferment Sac. erythraea and other polyketide 
producing microorganisms and to extract the novel polyketides specified herein, and all such 
methods are also included within the scope of this invention. 

It will also be apparent that many modifications and variations of the invention as set 
forth herein are possible without departing from the spirit and scope thereof, and that, 
accordingly, such limitations are imposed only as indicated by the appended claims. 
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We claim: 

1 . An isolated single or double stranded polynucleotide having a nucleotide sequence 
which comprises (a) a nucleotide sequence selected from the group consisting of (i) the 
sense sequence of SEQ ED NO:l from about nucleotide position 54 to about nucleotide 
position 1 136; (ii) the sense sequence of SEQ ID NO:l from about nucleotide position 1 147 
to about nucleotide position 2412; (iii) sense sequence of SEQ ID NO: 1 from about 
nucleotide position 2409 to about nucleotide position 3410; (iv) the sense sequence of SEQ 
ID NO:2 from about nucleotide position 80 to about nucleotide position 1048; (v) the sense 
sequence of SEQ ID NO:2 from about nucleotide position 1048 to about nucleotide position 
2295; (vi) the sense sequence of SEQ ID NO:2 from about nucleotide position 2348 to about 
nucleotide position 3061; (vii) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 3214 to about nucleotide position 4677; (viii) the sense sequence of SEQ ID NO:2 
from about nucleotide position 4674 to about nucleotide position 5879; (iv) the sense 
sequence of SEQ ID NO:2 from about nucleotide position 5917 to about nucleotide position 
7386; and (x) the sense sequence of SEQ ID NO:2 from about nucleotide position 7415 to 
about nucleotide position 7996; 

(b) sequences complementary to the sequences of (a); 

(c) sequences that, on expression, encode a polypeptide encoded by the 
sequences of (a); and 

(d) analogous sequences that hybridize under stringent conditions to the 
sequences of (a). 

2. The polynucleotide of claim 1 that is a DNA molecule or RNA molecule. 

3. The polynucleotide of claim 2 wherein the nucleotide sequence is the nucleotide 
sequence of (a) selected from the group consisting of (i) the sense sequence of SEQ ID NO: 1 
from about nucleotide position 54 to about nucleotide position 1 136; (ii) the sense sequence 
of SEQ ID NO:l from about nucleotide position 1 147 to about nucleotide position 2412; (iii) 
the sense sequence of SEQ ID NO:2 from about nucleotide position 2348 to about nucleotide 
position 3061 ; (iv) the sense sequence of SEQ ID NO:2 from about nucleotide position 4674 
to about nucleotide position 5879; and (v) the sense sequence of SEQ ED NO:2 from about 
nucleotide position 5917 to about nucleotide position 7386, 

4. The polynucleotide of claim 2 wherein the nucleotide sequence is the nucleotide 
sequence of (a) selected from the group consisting of (i) sense sequence of SEQ ID NO: 1 
from about nucleotide position 2409 to about nucleotide position 3410; (ii) the sense 
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sequence of SEQ ID NO:2 from about nucleotide position 80 to about nucleotide position 
1048; (iii) the sense sequence of SEQ ID NO:2 from about nucleotide position 1048 to about 
nucleotide position 2295; (iv) the sense sequence of SEQ ID NO:2 from about. nucleotide 
position 3214 to about nucleotide position 4677; and (v) the sense sequence of SEQ ED NO:2 
from about nucleotide position 7415 to about nucleotide position 7996. 

5. The polynucleotide of claim 2 wherein the nucleotide sequence is the nucleotide 
sequence of (a) having the sense sequence of SEQ ED NO:2 from about nucleotide position 
80 to about nucleotide position 1048. 

6. A vector comprising the DNA molecule of claim 2/ 

7. The vector of claim 6 further comprising an enhancer-promoter operatively linked to 
the polynucleotide. 

8. The vector of claim 6 wherein the polynucleotide has the nucleotide sequence of 
claim 5. 

9. A host cell transformed with the vector of claim 6 or claim 7 or claim 8. 

10. The transformed host cell of claim 9 that is a bacterial cell. 

1 1 . The transformed host cell of claim 10 wherein the bacterial cell is selected from the 
group consisting of Streptomyces and E. coli. 

12. A method for directing the biosynthesis of specific glycosylation-modified 
polyketides by genetic manipulation of a polyketide-producing microorganism, said method 
comprising the steps of: 

(1 ) isolating a sugar biosynthesis gene-containing DNA sequence according to claim 

1; 

(2) identifying within said gene-containing DNA sequence one or more DNA 
fragments responsible for the biosynthesis of a polyketide-associated sugar or its attachment 
to a polyketide; 

(3) creating one or more specified changes into said DNA fragment or fragments, 
thereby resulting in an altered DNA sequence; 

(4) introducing said altered DNA sequence into a polyketide-producing 
microorganism to replace the original sequence, said altered DNA sequence, when translated. 
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resulting in altered enzymatic activity capable of effecting the production of said specific 
glycosylation-modified polyketide; 

(5) growing a culture of said altered polyketide-producing microorganism under 
conditions suitable for the formation of said specific glycosylation-modified polyketide; and 

(6) isolating said specific glycosylation-modified polyketide from said culture. 

13. The method of claim 12 wherein said specified change in said DNA fragment or 
fragments results in the inactivation of at least one enzymatic activity involved in the 
biosynthesis of a polyketide-associated sugar or in its attachment to a polyketide. 

14. The method of claim 13 wherein said polyketide-associated sugar is L-mycarose. 

15. The method of claim 13 wherein said polyketide-associated sugar is D-desosamine. 

16. A method for directing the biosynthesis of specific glycosylation-modified 
polyketides by genetic manipulation of a polyketide-producing microorganism, said method 
comprising the steps of: 

(1) isolating a sugar biosynthesis gene-containing DNA sequence according to claim 

1; 

(2) identifying within said gene-containing DNA sequence one or more DNA 
fragments responsible for the biosynthesis of a polyketide-associated sugar or its attachment 
to a polyketide; 

(3) reversing the strand orientation of said DNA fragment or fragments, thereby 
resulting in an altered DNA sequence which, when transcribed, results in production of an 
antisense mRNA; 

(4) introducing said altered DNA sequence into a polyketide-producing 
microorganism, having an mRNA capable of binding to said antisense mRNA to produce an 
altered polyketide-producing microorganism capable of producing said specific 
glycosylation-modified polyketide; 

(5) growing a culture of said altered polyketide-producing microorganism under 
conditions suitable for the formation of said specific glycosylation-modified polyketide; and 

(6) isolating said specific glycosylation-modified polyketide from said culture. 

17. A method for directing the biosynthesis of specific glycosylation-modified 
polyketides by genetic manipulation of a polyketide-producing microorganism, said method 
comprising the steps of: 

(1) isolating a sugar biosynthesis gene-containing DNA sequence according to claim 
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5-1; 

(2) identifying within said gene-containing DNA sequence one or more DNA 
fragments responsible for the biosynthesis of a polyketide-associated sugar or its attachment 
to a polyketide; 

(3) introducing said DNA fragment or fragments into a distinct polyketide-producing 
to microorganism to produce an altered polyketide-producing microorganism capable of 

producing said specific glycosylation-modified polyketide; 

(4) growing a culture of said polyketide-producing microorganism containing said 
DNA fragment or fragments under conditions suitable for the formation of said specific 
glycosylation-modified polyketide; and 

15 (6) isolating said specific glycosylation-modified polyketide from said culture. 

18. The method of claim 13 or claim 16 or claim 17 wherein said DNA fragment 
comprises one or more genes which encode an enzymatic activity involved in the 
biosynthesis of L-mycarose or in its attachment to a polyketide. 

19. The method of claim 13 or claim 16 or claim 17 wherein said DNA fragment 
comprises one or more genes which encode an enzymatic activity involved in the 
biosynthesis of D-desosamine or in its attachment to a polyketide. 

20. The method of claim 13 or claim 16 or claim 17 wherein said DNA fragment is the 
sequence of claim 8. 

21 . An isolated polypeptide having an amino acid sequence encoded by a nucleotide 
sequence selected from the group consisting of the sense sequence of SEQ ID NO:l from 
about nucleotide position 54 to about nucleotide position 1 136; the sense sequence of SEQ ID 
NO: 1 from about nucleotide position 1 147 to about nucleotide position 2412; sense sequence 

5 of SEQ ID NO: 1 from about nucleotide position 2409 to about nucleotide position 3410; the 
sense sequence of SEQ ID NO:2 from about nucleotide position 80 to about nucleotide 
position 1048; the sense sequence of SEQ ID NO:2 from about nucleotide position 1048 to 
about nucleotide position 2295; the sense sequence of SEQ ED NO:2 from about nucleotide 
position 2348 to about nucleotide position 3061; the sense sequence of SEQ ID NO:2 from 

io about nucleotide position 3214 to about nucleotide position 4677 ; the sense sequence of SEQ 
ED NO:2 from about nucleotide position 4674 to about nucleotide position 5879; the sense 
sequence of SEQ ID NO:2 from about nucleotide position 5917 to about nucleotide position 
7386; and the sense sequence of SEQ ID NO:2 from about nucleotide position 7415 to about 
nucleotide position 7996. 
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22. ' An isolated polypeptide of claim 3 1 encoded by the sequence of SEQ ID NO:2 from 
about nucleotide position 80 to about nucleotide position 1048. 
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L CACCCCGACGCGATCGCGCGGCACATCGACGCCTGGCTGGGCGGAGGGAATTCATGACCA 

M T T 



60 



61 CGACCGATCGCGCCGGGCTGGGCAGGCAGCTCCAGATGATCCGCGGCCTGCACTGGGGTT 120 
T DRAGLGRQLQMIRGLHWGY 

L21 ACGGCAGCAACGGCGACCCTTACCCGATGCTGCTGTGCGGACACGACGACGACCCGCAGC 180 
GSNGDP YPMLLCGHDDDPQR 

181 GCCGGT ACCGGTCGATGCGCGAGTCCGGTGTGCGGCGCAGGACCGAGACGTGGGTGGTGG 2 4 0 
RYRSMRESGVRRRTETWVVA 

241 CCGACCACGCCACCGCCCGGCAGGTGCTCGACGACCCCGCGTTCACCCGCGCCACCGGAC 300 
DHATARQVLDDPAFTRATG R 

301 GCACACCGGAATGGATGCGGGCCGCGGGCGCGCCACCCGCCGAGTGGGCCCAGCCGTTCC 3 60 
TPEWMRAAGAPPAEWAQPFR 

361 GGGACGTGCACGCCGCGTCCTGGGAAGGCGAGGTCCCCGACGTCGGGGAACTGGCGGAGA 4 20 
DVHA ASWEGEVPDVGELAES 

4 21 GCTTCGCCGGTCTGCTCCCCGGCGCGGGCGCGCGGCTGGACCTGGTCGGCGACTTCGCCT 4 80 
FAGLLPGAGARLDLVGDFAW 

4 81 GGCAGGTACCGGTGCAGGGCATGACCGCCGTGCTCGGCGCAGCCGGAGTGCTGCGCGGCG 54 0 
QVP VQGMTAVLGAAGVLRGA 

541 CCGCGTGGGACGCCCGCGTCAGCCTGGACGCCCAGCTCAGCCCGCAGCAGCTCGCGGTGA 600 
AW DARVSLDAQLSPQQLAVT 

601 CCGAAGCAGCGGTCGCGGCACTGCCCGCCGACCCCGCACTGCGCGCCCTGTTCGCCGGGG 6 60 
EAAVAALPADPALRALFAGA 

661 CCGAGATGACCGCGAACACCGTGGTCGACGCGGTCCTGGCCGTCTCGGCCGAACCGGGGC 720 
EMTANTV VDAVLAVSAEPGL 

7 21 TGGCCGAACGGATCGCCGACGACCCCGCCGCCGCGCAGCGAACCGTCGCCGAGGTGCTGC 7 80 
AER IADDPAAAQRTVAEVLR 

7 81 GCCTGCACCCGGCATTGCACCTGGAGCGGCGCACGGCCACCGCAGAGGTGCGGCTCGGCG 8 4 0 

LHP ALHLERRTATAEVRLGE 

8 41 AGCACGTGATCGGCGAAGGCGAGGAGGTCGTGGTCGTCGTCGCGGCGGCCAACCGCGACC 900 

HV I GEGEEVVVV VAAANRDP 

901 CGGAGGTCTTCGCCGAGCCCGACCGCCTCGACGTGGACCGCCCCGACGCCGACCGCGCGC 9 60 
EVFAEPDRLDVDRPDADRAL 



FIG. 4A-1 
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9 61 TGTCGGCACATCGCGGCCACCCCGGCAGGCTGGAGGAGCTGGTCACCGCGCTCGCCACCG 1020 
SAHRGHPGRLEELVTALATA 



1021 CCGCACTGCGGGCCGCGGCCAAGGCGCTGCCCGGACTCACGCCCAGCGGCCCGGTCGTCC 108 0 
AL RAAAKALP GLT P SGPVVR 



10 81 GGCGCCGCCGATCACCCGTCCTGCGGGGAACCAACCGCTGCCCCGTCGAGCTCTGAGGAT 114 0 
RRRS PV LRGTNRCP V E L * 



1141 TCCGCGAXGCGCGTCGTCTTCTCCTCCATGGCCAGCAAGAGCCACCTCTTCGGCCTCGTC 1200 
MR'VVFSSMASKS HLFGLV 



1201 CCCCTCGCATGGGCGTTCCGCGCGGCGGGGCACGAGGTCCGCGTGGTCGCGTCCCCGGCG 1260 
P LAWAFRAAGHEVRVVASPA 



12 61 CTCACCGAGGACATCACCGCGGCCGGGCTGACCGCCGTCCCGGTCGGCACCGACGTCGAC 1320 
LTED ITAAGLTAVP VGTDVD 



1321 CTCGTGGACTTCATGACCCACGCGGGCCACGACATCATCGACTACGTCCGGAGCCTGGAC 1380 
LVDF MTHAGHD I ID YVRSLD 



1381 TTCAGCGAGCGGGACCCCGCCACCTTGACCTGGGAGCACCTGCGGGGCATGCAGACCGTG 144 0 
FSERDPATLTWEHLRGMQTV 



14 41 CTCACCCCGACCTTCTACGCCCTGATGAGCCCGGACACGCTCATCGAAGGCATGGTCTCG 1500 
LTPTFYALMSPDTLIEGMVS- 



1501 TTCTGCCGGAAGTGGCGGCCCGACCTGGTCATCTGGGAGCCGCTCACCTTCGCCGCGCCC 1560 
FCRKWRPDLVIWEP LTFAAP 



15 61 ATCGCGGGCGCGGTGACCGGAACGCCGCACGCGCGGCTGCTGTGGGGACCCGAC ATCACC 1620 
I AGAVTGTPHARLLWGPD IT 



1621 ACCCGGGCGCGGCAGAACTTCCTCGGCCTGCTGCCCGACCAGCCGGAGGAGCACCGGGAG 1680 
TRARQNFLGLLPDQPEEHRE 



1681 ' GGCCCGCTCGCCGAGTGGCTCACCTGGACGCTGGAGAAGTACGGCGGCCCGGCCTTCGAC 17 4 0 
G P LA EW LTWTLEKY GGPAFD 



17 41 GAGGAGGTGGTCGTCGGGCAGTGGACGATCGACCCCGCCCCGGCCGCGATCAGGCTCGAC 1800 
EEVVVGQWTID PAP AAIRLD 



1801 ACCGGCCTGAAGACCGTCGGGATGCGCTACGTCGACTACAACGGGCCGTCCGTGGTGCCG 18 60 
TGLKTVGMRYVDYNGPSVVP 



18 61 GAATGGCTGCACGACG AGCCCGAGCGCCGCCGCGTGTGCCTC ACGCTCGGGATCTCCAGC 1920 
EWLHDEPERRRVCLTLGISS 
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1921 CGCGAGAACAGCATCGGGCAGGTCTCCATCGAGGAGCTGCTGGCTGCCGTCGGCGACGTC 1980 
R E N S IGQVSIEELLGAVGDV 

1981 GACGCCGAGATCATCGCGACCTTCGACGCGCAGCAGCTAGAAGGCGTCGCGAACATCCCG 204 0 
DAEI IATFDAQQLEGVANIP 

2041 CACAACGTCCGCACGGTCGGCTTCGTCCCGATGCACGCGCTGCTGCCGACCTGCGCGGCG 2100 
HNVRTVGFVPMHALLPTCAA 

2101 ACGGTGCACCACGGCGGACCCGGGAGCTGGCACACCGCGGCGATCCACGGCGTGCCGCAG 2160 
TVHHGGPGSWHTAAIHGVPQ 

2161 GTGATCCTGCCCGACGGCTGGGACACCGGCGTGCGCGCGCAGCGCACGCAGGAATTCGGG 2 220 
VILPDGWDTGVRAQRTQEFG 

2221 GCGGGGATCGCGCTGCCCGTGCCCGAGCTGACCCCCGACCAGCTCCGGGAGTCGGTGAAG 2280 
AGIALPVPELTPDQLRESVK 

, • • 

2281 CGGGTCCTCGACGACCCGGCCCACCGCGCCGGCGCGGCGCGGATGCGCGACGACATGCTC 2340 
RVLDDPAHRAGAARMRDDML 

2341 GCGGAGCCGTCACCGGCCGAGGTCGTCGGCATCTGCGAGGAACTGGCCGCAGGAAGGAGA 2 400 
AEP S P AEVVG I C.EE LAAGRR 

2401 GAACCACGATGACCACCGACGCCGCGACGCACGTGCGGCTCGGGCGTTCCGCGCTGCTCA 2460 
E P R * 

MTTDAATHVRLGRSALLT 

24 61 CCAGCAGGCTCTGGCTCGGCACGGTGAACTTCAGCGGACGCGTCGAGGACGACGACGCGC 2520 
SRLWLGTVNFSGRVEDDDAL 

2521 TGCGCCTG ATGG ACC ACGCCCGGG ACCGCGGC ATCAACTGCCTCGACACCGCCG ACATGT 2 58 0 
RLMDHARDRGINCLDTADMY 

2581 ACGGCTGGCGGCTCTACAAGGGCCACACCGAGGAGCTGGTGGGCAGGTGGCTGGCCCAGG 2 64 0 
GWRLYKGHTEELVGRWLAQG 

2 641 GCGGCGGACGGCGCGAGG AC ACCGTGCTGGCGACCAAGGTCGGCGGCGAGATGAGCGAGC 27 0 0 
GGRREDTVLATKVGGEMS ER 

2701 GCGTCAACGACAGCGGGCTGTCGGCGCGGCACATCATCGCCTCCTGCGAGGGATCGCTGC 2760 
VNDSGLSARHIIASCEGSLR 

27 61 GCAGGCTGGGCGTCGACC ACATCGACGTCTACCAGATGCACCAC ATCGACCGGTCCGCGC 2 820 
RLGVD HIDVYQMHHIDRSAP 

2821 CGTGGGACGAGGTGTGGCAGGCCATGGACAGCCTCGTCGCCAGCGGCAAGGTCTCCTACG 2880 
WDEVWQAMDSLVASGKVSYV 
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2 8 81 TCGGCTCGTCGAACrTCGCGGGCTGGCAC ATCGCCGCCGCGC AGGAGAACGCCGCCCGCC 2 940 
GS SNFAGWHIAAAQENAARR 



29 41 GCCACTCCCTGGGC ATGGTCTCCCACCAGTGCCTGT ACAACCTGGCGGTCCGGCACGCCG 3000 
HS LGMVS HQCLYNLAVRHAE 



3001 AGCTGGAGGTGCTGCCCGCCGCGCAGGCCT ACGGGCTCGGCGTCTTCGCCTGGTCGCCGC 3060 
LEV LPAAQAYGLGVFAWSPL. 



30 61 TGCACGGCGGCCTGCTCAGCGGAGCGCTGGAGAAGCTGGCCGCGGGCACCGCGGTGAAGT 3 120 
HGGLLS GALEKLAAGTAVKS 



3121 CGGCGCAGGGCCGTGCGCAGGTGCTGTTGCCGTCCCTGCGCCCGGCGATCGAGGCCTACG 318 0 
AQGRAQVLLP SLRP AI E A Y E 



3181 AGAAGTTCTGCCGCAACCTCGGCGAAGACCCGGCCGAGGTGGGGCTCGCATGGGTGCTGT 324 0 
KFCRNLGEDPAEVGLAWVLS 



32 41 CCCGGCCCGGCATCGCCGGCGCCGTCATCGGCCCGCGAACCCCCGAGCAGCTCGACTCCG 3300 
RP GIAGAVIGPRTP E" Q L D S A 



3301 CGCTGAAGGCGTCCGCGATGACCCTGGACGAGCAGGCGCTGTCCGAACTGGACGAGATCT 3360 
LKASAMTLDEQALS ELDEIF 



33 61 TCCCCGCGGTGGCCTCCGGCGGCGCGGCGCCGGAAGCCTGGTTGCAGTGAGCACAAGAGG 3420 
PAVASGGAAPEAWLQ* 



3421 AACCGAGAAAGGATACGGCTGGTG AGCGTGAAGCAGAAGTCAGCGTTGCAGGACCTGGTC 3 4 80 



34 81 GACTTCGCCAAGTGGCACGTGTGGACCAGGGTGCGGCCGTCCAGCCGTGCGCGCCTGGCC 3S40 



3541 TACGAGCTGTTCGCCGACGACC ACGAGGCCACGACCGAGGGCGCCTACATCAACCTCGGC 3 600 



3601 TACTGGAAGCCCGGGTGCGCCGGCCTGGAGGAGGCCAACCAGGAGCTGGCGAACCAGCTC 3 660 



3661 GCCGAGGCCGCGGGGATCAGCGAGGGCGACGAGGTGCTCGACGTCGGGTTCGGGCTCGGC 3720 



37 21 GCGCAGGACTTCTTCTGGCTCGACCTGCAGCCAGCT 37 5 6 
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1 CGGGTTGCCGCACATCGCGCTGGGGAGATTCTTTGAATTTCGCCCGTAGCACCGACCTGG 60 

6 1 AAAGCGAGCAAATGCTCCGGTGAATGGGATCAGTGATTCCCCGCGTCAATTGATCACCCT 

VNGISDSPRQ^i * u 



120 



121 TCTGGGCGCTTCCGGCTTCGTCGGGAGCGCGGTTCTGCGCGAGCTGCGCG^ 180 

181 CCGGCTGCGCGCGGTGTCCCGCGGCGGAGCGCCCGCGGTTCCGCCCGGCGCCGCGGAGGT 240 
RLRA VSRGGAPAVPPGAAEV 

241 CGAGGACCTGCGCGCCGACCTGCTGGAACCGGGCCGGGCCGCCGCCGCGATCGAGGACGC 300 
EDLRADLLEPGRAAAA-EOA 

301 CGACGTGATCGTGCACCTGGTGGCGCACGCAGCGGGCGGTTCCACCTGGCGCAGCGCCAC 360 
D V I VH LV AHAAGG S TWKJsa 

3.1 CTCCGACCCGG^ «« 

4 21 GCACGATCGCCGCAGGTCGACGCCGCCCGTGTTGCTCTACGCGAGCACCGCACAGGCCGC 
HDRRRSTPPVLL^ ASTAUA" 

481 GAACCCGTCGGCGGCCAGCAGGTACGCGCAGCAGAAGACCGAGGCCGA 

541 CAAAGCCACCGACGAGGGCCGGGTGCGCGGCGTGATCCTGCGGCTGCCCGCGGTCTACGG 600 
KA .TDEGRVRGVILRLPAVIL. 



480 



540 



660 



601 CCAGAGCGGCCCGTCCGGCCCCATGGGGCGGGGCGTGGTCGCAGCGATGATCCGGCGTGC 
QSGPSGPMGRGVVAAMIRRA 

661 CCTCGCCGGCGAGCCGCTCACCATGTGGCACGACGGCGGCGTGCGCCGCGACCTGCTGCA 720 



780 



L A G E P L T M « H D G G V R R D L L H 

721 ' CGTCGAGGACGTGGCCACCGCGTTCGCCGCCGCGCTGGAGCACCACGACGCGCTGGCCGG 
VEDVATAFAAALEHHDALAt. 

781 CGGCACGTGGGCGCTGGGCGCCGACCGATCCGAGCCGCTCGGCGACATCTTCCGGGCCGT 840 
GTWALG ADRSEPLGDIFRAV 

841 CTCCGGCAGCGTCGCCCGGCAGACCGGCAGCCCCGCCGTCGACGTGG^ 900 

901 GCCCGAGCACGCGGAGGCCAACGACTTCCGCAGCGACGACATCGACTCCACCGAGTTCCG 9 60 
PEHAEANDFRSDDIDSTEI 
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9 6 1 CAGCCGGACCGGCTGGCGCCCCCGGGTTTCCCTCACCGACGGCATCGACCGGACGGTGGC 1020 
SRTGWRPRVSLTDGIDRTVA 

1021 CGCCCTGACCCCCACCGAGGAGCACTAGTGCGGGTACTGCTGACGTCCTTCGCGCACCGC 10 80 
AL TPTEEH* 

VRVLLTSFAHR 

10 81 ACGCACTTCCAGGGACTGGTCCCGCTGGCGTGGGCGCTGCGCACCGCGGGTCACGACGTG 1140 
T HFQ GLVP LAWALRTAGHDV 

• • * • * * 

1141 CGCGTGGCCGCCCAGCCCGCGCTC ACCGACGCGGTCATCGGCGCCGGTCTCACCGCGGTA 1200 
RVAAQP ALTDAVIGAGLTAV 

. . . • • • 

1201 CCCGTCGGCTCCGACCACCGGCTGTTCGACATCG7CCCGGAAGTCGCCGCTCAGGTGCAC 1260 
PVGSDHRLFDIVPEVAAQVH 

1261 CGCTACTCCTTCTACCTGGACTTCTACCACCGCGAGCAGGAGCTGCACTCGTGGGAGTTC 1320 
RYSFYLDFYHREQELHSWEF 

1321 CTGCTCGGCATGCAGGAGGCCACCTCGCGGTGGGTATACCCGGTGGTCAACAACGACTCC 138 0 
LLGMQEATSRWVYPVVNNDS 

1381 TTCGTCGCCGAGCTGGTCGACTTCGCCCGGGACTGGCGTCCTGACCTGGTGCTCTGGGAG 1440 
F V A E LVD F ARDW .RP D LVLWE 

14 41 CCGTTCACCTTCGCCGGCGCCGTCGCGGCCCGGGCCTGCGGAGCCGCGCACGCCCGGCTG 1500 

P F T F AG A . V AARACGAAHARL 

15 01 CTGTGGGGCAGCGACCTC ACCGGCTACTTCCGCGGCCGGTTCCAGGCGCAACGCCTGCGA 15 60 

I* W G S D LTG YFRGRFQAQRLR 

1561 CGGCCGCCGGAGGACCGGCCGGACCCGCTGGGCACGTGGCTGACCGAGGTCGCGGGGCGC 1620 
RPPEDRPDPLGTWLTEVAGR 

162L TTCGGCGTCGAATTCGGCGAGGACCTCGCGGTCGGGCAGTGGTCGGTCGACCAGTTGCCG 168 0 
FGVEFGED LAVGQWSVDQLP 

16 81 CCGAGTTTCCGGCTGGACACCGGAATGGAAACCGTTGTCGCGCGGACCCTGCCCTACAAC 17 4 0 

PSFRLDTGMETVVARTLPYN 

17 41 GGCGCGTCGGTGGTTCCGGACTGGCTCAAGAAGGGCAGTGCGACTCGACGCATCTGCATT 180 0 

GASVVP DWLKKGSATRRICI 

18 01 ACCGGAGGGTTCTCCGGACTCGGGCTCGCCGCCGATGCCG ATCAGTTCGCGCGGACGCTC 1860 

TGGF SGLGLAADADQFAR-T L 
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18 61 GCGC AGCTCGCGCGATTCGATGGCGAAATCGTGGTTACGGGTTCCGGTCCGGAT ACCTCC 19 2 0 
AQLARFDGEIVVTGSGPDTS 



1921 GCGGTACCGG AC AACATTCGTTTGGTGGATTTCGTTCCGATGGGCGTTCTGCTCCAGAAC 1980 
AVP DNIRLVDFVPMGVLLQN 



19 81 TGCGCGGCGATC ATCCACCACGGCGGGGCCGGAACCTGGGCC ACGGCACTGCACCACGGA 2040 
CAAI IHHGGAGTWATALHHG 



20 41 ATTCCGCAAAT ATCAGTTGCACATGAATGGGATTGCATGCTACGCGGCCAGCAGACCGCG 2100 
I P Q I SVAHEWDCMLRGQQTA 



2101 GAACTGGGCGCGGGAATCTACCTCCGGCCGGACGAGGTCGATGCCGACTCATTGGCGAGC 2160 
ELGAGIYLRPDEVDADSLAS 



2161 GCCCTCACCCAGGTGGTCGAGGACCCCACCTACACCGAGAACGCGGTGAAGCTTCGCGAG 2220 
ALTQVVEDPTYTENAVKLRE 



2221 G AGGCGCTGTCCGACCCGACGCCGCAGGAG ATCGTCCCGCGACTGGAGGAACTC ACGCGC 228 0 
EALS DPTPQE I VPRLEELTR 



22 81 CGCCACGCCGGCTAGCGGTTTCCGACCGACAAGTCCGTCCGACAGCACACCTCCGG AGGG 2 34 0 
R H A G 



2 341 AGCAGGGATGTACGAGGGCGGGTTCGCCGAGCTTTACGACCGGTTCTACCGCGGCCGGGG 2 4 0 0 
MYEG-GFAELYDRFYRGRG 



2 4 01 C AAGGACT ACGCGGCCGAGGCCGCGCAGGTCGCGCGGCTGGTCAGAGACCGCCTGCCCTC 2 4 60 
KDYAAEAAQVARLVRDRLPS 



24 61 GGCTTCCTCGCTGCTCGACGTGGCCTGCGGGACCGGCACCCACCTGCGCCGGTTCGCCGA 2 520 
ASS LLDVACGTGTHLRRFAD 



2521 CCTCTTCGACGACGTGACCGGGCTGGAGCTGTCGGCGGCGATGATCGAGGTCGCCCGGCC 258 0 
LFDDVTGLELSAAMIEVARP 



2581 GCAGCTCGGCGGCATCCCGGTGCTGCAGGGCGACATGCGCGACTTCGCGCTGGATCGCGA 2 64 0 
Q LG G I PVLQGDMRD FALD.RE 



2641 GTTCGACGCCGTCACCTGCATGTTCAGCTCCATCGGGCAC ATGCGCGACGGCGCCG AGCT 270 0 
FDAVTCMFSS IGHMRDGAEL 



2701 GGACCAGGCGCTGGCGTCCTTCGCCCGCCACCTCGCCCCCGGCGGCGTCGTGGTGGTCGA 2760 
DQA LASFARH LAPGGVVVVE 



27 61 ACCGTGGTGGTTCCCGGAGGACTTCCTCGACGGCTACGTGGCCGGTGACGTGGTGCGCGA 2 820 
PWWFP EDFLDGYVAGDVVRD 
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28 21 CGGCGACCTG ACGATCTCGCGCGTCTCGCACTCCGTGCGCGCCGGCGGCGCGACCCGGAT 2880 
GDLTI SRVSHSVRAGGATRM 

2881 GGAGATCCACTGGGTCGTGGCCGACGCGGTGAACGGTCCGCGGCACCACGTGGAGCACTA 2940 
EIHWVVADAVNGPRHHVEHY 

2 9 41 CGAGATCACGCTCTTCGAGCGGCAGCAGTACGAGAAGGCCTTCACCGCGGCCGGTTGCGC 3000 
EIT LFERQQYEKAFTAAGCA 

3001 TGTGCAGT ACCTGGAGGGCGGACCCTCCGG ACGCGGGTTGTTCGTCGGTGTGCGCGGATG 3060 
VQYLEGGPSGRGLFVGVRG* 

30 61 ACCCGTGCGTCGCGTTTTCCGTTCCTGGCACAGGTGATCCGCTCCACGGGCCCTTTCCCC 3120 

3121 GCCGTGACCGGACCCTTACAGTGAGTGCGGGTCTTGATCGACAACGCCCGGCGGCAGCAA 318 0 

3181 GCGGAGCCGTCGACGACACCGCAGGGAGAGTCGATGGGTGATCGGACCGGCGACCGGACG 324 0 

MGDRTGDRT 

32 41 ATTCCGGAATCCTCGCAGACCGCAACGCGTTTCCTGCTCGGCGACGGCGGAATCCCCACC 3300 

I P E S S QTATRF L LGDGGI PT 

3301 GCCACGGCGGAAACCCACGACTGGCTGACCCGCAACGGCGCCGAGCAGCGGCTCGAGGTG 3360 
ATAETHDWLTRNGAEQRLEV 

33 61 GCGCGCGTGCCGTTCAGCGCCATGGACCGCTGGTCGTTCCAGCCCGAGGACGGCAGGCTC 3420 

ARVPFSAMDRWSFQPSDGRL 

34 21 GCCCACGAGTCCGGGCGCTTCTTCTCCATCGAGGGCCTGCACGTGCGGACGAACTTCGGC 3 4 80 

AHESGRFFSIEGLHVRTNFG 

34 81 TGGCGGCGGGACTGGATCCAGCCCATCATCGTGCAGCCCGAGATCGGCTTCCTCGGCCTC 3 54 0 
WRRDWIQP I IVQPEIGFLGL 

3541 ATCGTCAAGGAGTTCGACGGTGTGCTGCACGTGCTGGCGCAGGCCAAGGCCGAGCCGGGC 3 600 
IVKEFDGVLHVLAQAKAEPG 

3601 AACATCAACGCCGTCCAGCTCTCCCCGACCCTGCAGGCGACCCGCAGCAACTACACCGGC 3 660 
NINAVQLSPTLQATRSNYTG 

36 61 GTCCACCGCGGCTCGAAGGTCCGGTTC ATCGAGT ACTTCAACGGCACGCGCCCG AGCCGG 3720 
VHRGSKVRFIEYFNGTRPSR 

3721 ATCCTCGTCGACGTGCTCCAGTCCGAGC AGGGCGCGTGGTTCCTGCGCAAGCGCAACCGG 37 8 0 
ILVDVLQSEQGAWFLRKRNR 
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3 7 81 AACATGGTCGTCGAGGTGTTCGACGACCTGCCCGAGCACC CG AACTTCCGGTGGCTGACC 3 8 4 0 

NMVVEVFDDLPEHPNFRWLT 

38 41 GTCGCGCAGCTGCGGGCGATGCTGCACCACGACAACGTGGTGAACATGGACCTGCGCACC 3 900 

VAQLRAMLHHDNVVNMDLRT 

3901 GTGCTGGCCTGCGTCCCGACCGCCGTGGAGCGGGACCGGGCCGACGACGTGCTCGCGCGC 3 9 60 
V L AC VP T AVE R D RAD DV.L AR 

39 61 CTGCCCGAGGGCTCGTTCCAGGCCCGGCTGCTGCACTCGTTCATCGGCGCGGGCACCCCG 4 020 

LP EGSFQARLLHSFIGAGTP 

4 0 21 GCCAAC AACATG AACAGCCTGCTGAGCTGGATCTCCGACGTGCGCGCCAGGCGCGAGTTC 4080 

ANNMNS LLSWI SDVRARREF 

4081 GTGC AGCGCGGCCGCCCGCT GCCCGACATCGAGCGCAGCGGGTGGATCCGCCGCGACGAC 414 0 
VQRGRP LPDIERSGWIRR DD 

4141 GGCATCGAGCACGAGGAGAAGAAGTACTTCGACGTCTTCGGCGTCACGGTGGCGACCAGC 4200 
GI EHEEKKYFDVFGVTVATS 

4 201 GACCGCGAGGTC AACTCGTGGATGCAGCCGCTGCTCTCGCCCGCCAACAACGGCCTGCTC 4 2 60 
DREVNSWMQPLLSPANNGLL 

42 61 GCCCTGCTGGTCAAGGACATCGGCGGCACGTTGCACGCGCTCGTGCAGCTGCGCACCGAG 4 320 
ALLVKD IGGTLHALVQLRTE 

4 321 GCGGGCGGGATGGACGTCGCCGAGCTGGCGCCTACGGTGCACTGCCAGCCCGACAACT AC 4 380 
AGGMDVAELAP TVHCQPDNY 

4 381 GCCGACGCGCCCGAGGAGTTCCGACCGGCCTATGTGGACTACGTGTTGAACGTGCCGCGC 4 4 4 0 
ADAPEEFRPAYVDYVLNVPR 

4 4 41 TCGCAGGT CCGCTACGACGCATGGCACTCCGAGGAGGGCGGCCGGTTCTACCGCAACGAG 4 500 
SQVRY.DAWHSEEGGRFYRNE 

4 501 AACCGGTACATGCTGATCGAGGTGCCCGCCGACTTCGACGCCAGTGCCGCTCCCGACCAC 4 560 
NRYMLI EVPADFDASAAPDH 

4561 CGGTGGATGACCTTCGACCAGATCACCTACCTGCTCGGGCACAGCCACTACGTCAACATC 4 620 
RWMTFDQITY LLGHSHYVNI 

4 621 CACGTGCGCAGCAT CATCGCGTGCGCCTCGGCCGTCTACACCAGGACCGCCGGATGAAAC 4 63 0 
HVRS I IACASAVYTRTAG* 

M K R 

4 681 GCGCGCTGACCGACCTGGCGATCTTCGGCGGCCCCGAGGCATTCCTGCACACCCTCTACG 4740 
ALTDLAIFGGPEAF-LHTLYV 
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4 7 41 TGGGCAGGCCGACCGTCGGGGACCGGGAGCGGrTCTTCGCCCGCCTGGAGTGGGCGCTGA 4 800 
GRPTVGDRERFFARLEWALN 

4 8 01 ACAACAACTGGCTG ACCAACGGCGGACCACTGGTGCGCGAGTTCGAGGGCCGGGTCGCCG 4 860 
NNWLTNGGPLVREFEG RVAD 

48 61 ACCTGGCGGGTGTCCGCCACTGCGTGGCCACCTGCAACGCGACGGTCGCGCTGCAACTGG 4 920 
LAGVRHCVATCNATVALQLV 

4 921 TGCTGCGCGCGAGCGACGTGTCCGGCGAGGTCGTCATGCCTTCGATGACGTTCGCGGCCA 4 980 
L. RASDVSGEVVMP S MTFAAT 

4 981 CCGCGCACGCGGCGAGCTGGCTGGGGCTGGAACCGGTGTTCTGCGACGTGGACCCCGAGA 504 0 
AHAASWLGLEPVFCDVDPET 

50 41 CCGGCCTGCTCGACCCCGAGCACGTCGCGTCGCTGGTCACACCGCGGACGGGCGCGATCA 5100 
GLLDPEHVASLVTP RTGAII 

5101 TCGGCGTGCACCTCTGGGGC AGGCCCGCTCCGGTCGAGGCGCTGGAGAAGATCGCCGCCG 5160 
GVHLWGRPAPVEALEKIAAE 

5161 AGCACCAGGTCAAACTCTTCTTCGACGCCGCGCACGCGCTGGGCTGCACCGCCGGCGGGC 5220 
HQVKLFFDAAHALGCTAGGR 

5221 GGCCGGTCGGCGCCTTCGGCAACGCCGAGGTGTTCAGCTTCCACGCCACGAAGGCGGTCA 5280 
P VGAFGNAEVFS FH ATKAVT 

52 81 CCTCGTTCGAGGGCGGCGCC ATCGTCACCGACGACGGGCTGCTGGCCG ACCGCATCCGCG 534 0 

SFEGGAIVTDDGLLADRIRA 

53 41 CCATGC AC AACTTCGGGATCGCACCGGACAAGCTGGTGACCGATGTCGGCACCAACGGCA 5 400 

M HNFGIAPDKLVTDVGTN GK 

5401 AGATGAGCGAGTGCGCCGCGGCGATGGGCCTCACCTCGCTCGACGCCTTCGCCGAGACCA 5 460 
MS ECAAAMGLTSLDAFAE TR 

54 61 GGGTGCACAACCGCCTCAACCACGCGCTCTACTCCGACGAGCTCCGCGACGTGCGCGGCA 5520 

VHNRLNHALYSDELRDVRGI 

5521 T ATCCGTGCACGCGTTCGATCCTGGCGAGC AGAACAACTACC AGT ACGTGATCATCTCGG 5580 
SVHAFDPGEQNNYQYVIISV 

5581 TGGACTCCGCGGCCACCGGCATCGACCGCGACCAGTTGCAGGCGATCCTGCGAGCGGAGA 5640 
DSAATGIDRDQLQAILRAEK 

56 41 AGGTTGTGGC AC AACCCT ACTTCTCCCCCGGGTGCCACCAG ATGC AGCCGTACCGGACCG 5700 
VVAQPYFSPGCHQMQP Y R T E 



FIG. 4B-6 

SUBSTITUTE SHEET (RULE 26) 



WO 97/23630 



PCT/US96/20238 



15 /45 



570 L AGCCGCCGCTGCGGCTGG AGAACACCGAACAGCTCTCCGACCGGGTGCTCGCGCTGCCCA 57 60 
PPLRLENTEQLSDRVLALPT 



57 61 CCGGCCCCGCGGTGTCCAGCGAGGACATCCGGCGGGTGTGCGACATCATCCGGCTCGCCG 5 82 0 
GPAVS SEDIRRVCD I IRLAA 



58 21 CCACCAGCGGCGAGCTGATCAACGCGCAATGGGACCAGAGGACGCGCAACGGTTCGTGAC 5880 
TSGELINAQWDQRTRNGS * 



58 81 GACCTGCGCCACAAGTGCCAGGAGGTTCGCTCCCCGATGAACACAACTCGTACGGCAACC 5 94 0 

MNTTRTA T 



5941 GCCCAGGAAGCGGGGGTCGCCG ACGCGGCGCGCCCGGACGTCGACCGGCGGGCGGTCGTG 6000 
AQEA GVADAARP DVDRRAVV 



6001 CGGGCGCTGAGCTCGGAGGTCTCCCGCGTCACCGGCGCCGGTGACGGTGACGCCCACGTG 60 60 
RALSSEVSRVTGAGDGDAHV 



60 61 CAGGCCGCCCGGCTCGCCGACCTCGCCGCGCACTACGGGGCGCACCCGTTCACGCCGCTG 6120 
QAARLADLAAHYGAHPFTPL 



6121 GAGCAGACGCGTGCGCGGCTCGGCCTGGACCGCGCGGAGTTCGCCCACCTGCTCGACCTG 6180 
EQTRARLGLDRAEFAHLLDL 



6181 TTCGGCCGCATCCCGGACCTGGGCACCGCGGTGGAGCACGGTCCGGCGGGCAAGTACTGG 624 0 
FGRIPDLGTAVEHGPAGKYW 



62 41 TCCAACACGATCAAGCCGCTGGACGCCGCAGGCGCACTGGACGCGGCGGTCTACCGCAAG 6300 
S NT I KP LDAAGALDAAVY RK 



6301 CCTGCCTTCCCCTACAGCGTCGGCCTGT AC CCCGGGCCGACGTGCATGTTCCGCTGCCAC 6360 
PAFPYSVGLYPGPTCMFRCH 



63 61 TTCTGCGTGCGGGTGACCGGTGCCCGCTACGAGGCCGCATCGGTCCCGGCGGGCAACGAG 6420 
FCVRVTGARYEA ASVPAGNE 



6421 * ACGCTGGCCGCGATCATCGACGAGGTGCCCACGGACAACCCGAAGGCGATGTACATGTCG 64 8 0 
TLAAI IDEVPTDNPKAMYMS 



64 81 GGCGGGCTCGAGCCGCTGACCAACCCCGGTCTCGGCGAGCTGGTGTCGCACGCCGCCGGG 654 0 
GGLEPL TNPGLGELVSHAAG 



6541 CGCGGTTTCGACCTCACCGTCTACACCAACGCCTTCGCCCTCACCG AGCAGACGCTGAAC 6600 
RGFDLTVYTNAFALTEQTLN 



6601 CGCCAGCCCGGCCTGTGGGAGCTGGGCGCG ATCCGCACGTCCCTCT ACGGGCTG AACAAC 6 660 
RQPGLWE LGAIRTSLYGLNN 
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6661 GACG AGTACG AG ACGACCACCGGC AAGCGCGGCGCTTTCGAACGCGTC AAGAAGAACCTG 6720 
DEYETTTGKRGAFERVKKNL 



67 21 CAGGGCTTCCTGCGGATGCGCGCCGAGCGGGACGCGCCGATCCGGCTCGGCTTC AACCAC 678 0 
QGF LRMRAERDAP I RLGFNH 



67 81 ATCATCCTGCCGGGACGGGCCGACCGGCTCACCGACCTCGTCGACTTCATCGCCGAGCTC 68 4 0 
I ILPGRADRLTDLVDF I A E L 



68 41 AACGAGTCCAGCCCGCAACGGCCGCTGGACTTCGTGACGGTGCGCG AGGACT ACAGCGGC 6 900 
NES S PQRP LDFVTVREDYSG 



6 901 CGCGACGACGGCCGGCTGTCGG ACTCCGAGCGCAACGAGCTGCGCGAGGGCCTGGTGCGG 6 960 
RDDGRL SDSERNELREGLVR 



69 61 TTCGTCGACTACGCCGCCGAGCGGACCCCGGGCATGCACATCGACCTGGGCTACGCCCTG 7020 
FVDYAAERTPGMHIDLGYAL 



7021 GAGAGCCTGCGGCGGGGTGTGGACGCCGAGCTGCTGCGCATCCGGCCGGAGACGATGCGT 7 080 
ESLR RGVDAELLRIRPETMR 



7 0 81 CCCACCGCGCACCCCCAGGTCGCGGTGCAGATCGACCTGCTCGGCG ACGTCTACCTCTAC 714 0 
PTAHPQVAVQIDLLGDVYLY 

7141 CGCGAGGCGGGCTTCCCGGAGCTGGAGGGCGCCACCCGCTACATCGCGGGCCGGGTCACC 7200 
REAGFPELEGATRY IAGRVT 



7 2 01 CCGTCGACCAGCCTGCGCGAGGTGGTGGAGAACTTCGTGCTGGAGAACGAGGGCGTGCAG 7 260 
PSTSLREVVENFVLENEGVQ 



7 2 61 CCCCGCCCCGGCGACGAGTACTTCCTCG ACGGCTTCGACCAGTCGGTG ACCGCACGGCTC 7 32 0 
PRPGDEYFLDGFDQSVTARL 



7321 AACCAGCTCGAACG AGACATCGCCGACGGGTGGG AGGACCACCGCGGCTTCCTGCGCGGA 738 0 
NQLERDIADGWEDHRGFLRG 



7381 * AGGTGAACCGGAGTTGCGAGTACGTGAGCTGGCGGTGGCGGGCGGTTTCGAGTTCACCCC 7440 
R * VAGGFEFTP 



7 4 41 CGACCCGAAGCAGGACCGGCGGGGCCTGTTCGTGTCTCCGCTGCAGGACGAGGCGTTCGT 7 500 
DP KQD RRGL.FVS P LQD EAFV 



7 501 GGGCGCGGTGGGCCATCGGTTCCCCGTCGCCCAGATGAACCACATCGTCTCCGCCCGGGG 7 560 
GAVGHRFPVAQMNH I VSARG 



7 5 61 CGTGCTGCGCGGGCTGCACTTCACCACCACCCCGCCGGGGCAGTGCAAGTACGTCT ACTG 7 620 
VL RGLHFTTTPPGQCKYVYC 
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7 6 21 CGCGCGCGGCCGGGCGCTCGAeGTCATCGTCGACATCCGGGTCGGCTCGCCGACGTTCGG 7 68 0 
ARGRALDV IVDIRVGSPTFG 



7 6 81 GAAGTGGGACGCGGTGGAGATGGACACCGAGCACTTCCGGGCGGTCTACTTCCCCAGGGG 7 7 4 0 
KWDAVEMDTEHFRAVYFPRG 



7 7 41 CACCGCGCACGCCTTCCTCGCGCTTGAGGACGACACCCTGATGTCGTACCTGGTCAGCAC 7 800 
TAHAFLALEDDTLMS YLVST 



7 8 01 GCCGTACGTGGCCG AGTACGAGCAGGCGATCG ACCCGTTCGACCCCGCGCTGGGTCTGCC 7 8 60 
PYVAEYEQAIDPFDPALGLP 



7 8 61 GTGGCCCGCGGACCTGGAGGTCGTGCTCTCCGACCGCGACACGGTGGCCGTGGACCTGGA 7 920 
WPADLEVVLSDRDTVAVDLE 



7 9 21 GACCGCCAGGCGGCGAGGGATGCTGCCCGACT ACGCCGACTGCCTCGGCGAGGAGCCCGC 7 98 0 
TARRRGMLPDYADCLGEEPA 



7 981 CAGCACCGGCAGGTGACGGGTCCCGAGCACGATCTGTTCGAAGTGGCGCAGGCGCTCGTC 8 04 0 
S T G R * 



8041 GTCGCGGTCGA 8051 
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